In Silico Analysis for the Study of Botulinum Toxin Structure
NASA Astrophysics Data System (ADS)
Suzuki, Tomonori; Miyazaki, Satoru
2010-01-01
Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.
Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem
2011-08-11
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.
Utilization of protein intrinsic disorder knowledge in structural proteomics
Oldfield, Christopher J.; Xue, Bin; Van, Ya-Yue; Ulrich, Eldon L.; Markley, John L.; Dunker, A. Keith; Uversky, Vladimir N.
2014-01-01
Intrinsically disordered proteins (IDPs) and proteins with long disordered regions are highly abundant in various proteomes. Despite their lack of well-defined ordered structure, these proteins and regions are frequently involved in crucial biological processes. Although in recent years these proteins have attracted the attention of many researchers, IDPs represent a significant challenge for structural characterization since these proteins can impact many of the processes in the structure determination pipeline. Here we investigate the effects of IDPs on the structure determination process and the utility of disorder prediction in selecting and improving proteins for structural characterization. Examination of the extent of intrinsic disorder in existing crystal structures found that relatively few protein crystal structures contain extensive regions of intrinsic disorder. Although intrinsic disorder is not the only cause of crystallization failures and many structured proteins cannot be crystallized, filtering out highly disordered proteins from structure-determination target lists is still likely to be cost effective. Therefore it is desirable to avoid highly disordered proteins from structure-determination target lists and we show that disorder prediction can be applied effectively to enrich structure determination pipelines with proteins more likely to yield crystal structures. For structural investigation of specific proteins, disorder prediction can be used to improve targets for structure determination. Finally, a framework for considering intrinsic disorder in the structure determination pipeline is proposed. PMID:23232152
Structure based alignment and clustering of proteins (STRALCP)
Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.
2013-06-18
Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.
Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi
2017-10-12
Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.
Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok
2017-03-01
Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Protein flexibility in the light of structural alphabets
Craveur, Pierrick; Joseph, Agnel P.; Esque, Jeremy; Narwani, Tarun J.; Noël, Floriane; Shinada, Nicolas; Goguet, Matthieu; Leonard, Sylvain; Poulain, Pierre; Bertrand, Olivier; Faure, Guilhem; Rebehmed, Joseph; Ghozlane, Amine; Swapna, Lakshmipuram S.; Bhaskara, Ramachandra M.; Barnoud, Jonathan; Téletchéa, Stéphane; Jallu, Vincent; Cerny, Jiri; Schneider, Bohdan; Etchebest, Catherine; Srinivasan, Narayanaswamy; Gelly, Jean-Christophe; de Brevern, Alexandre G.
2015-01-01
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases. PMID:26075209
A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.
Li, Haiou; Lyu, Qiang; Cheng, Jianlin
2016-12-01
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Protein structure similarity from Principle Component Correlation analysis.
Zhou, Xiaobo; Chou, James; Wong, Stephen T C
2006-01-25
Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.
Loving, Kathryn A.; Lin, Andy; Cheng, Alan C.
2014-01-01
Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets. PMID:25079060
What are the structural features that drive partitioning of proteins in aqueous two-phase systems?
Wu, Zhonghua; Hu, Gang; Wang, Kui; Zaslavsky, Boris Yu; Kurgan, Lukasz; Uversky, Vladimir N
2017-01-01
Protein partitioning in aqueous two-phase systems (ATPSs) represents a convenient, inexpensive, and easy to scale-up protein separation technique. Since partition behavior of a protein dramatically depends on an ATPS composition, it would be highly beneficial to have reliable means for (even qualitative) prediction of partitioning of a target protein under different conditions. Our aim was to understand which structural features of proteins contribute to partitioning of a query protein in a given ATPS. We undertook a systematic empirical analysis of relations between 57 numerical structural descriptors derived from the corresponding amino acid sequences and crystal structures of 10 well-characterized proteins and the partition behavior of these proteins in 29 different ATPSs. This analysis revealed that just a few structural characteristics of proteins can accurately determine behavior of these proteins in a given ATPS. However, partition behavior of proteins in different ATPSs relies on different structural features. In other words, we could not find a unique set of protein structural features derived from their crystal structures that could be used for the description of the protein partition behavior of all proteins in all ATPSs analyzed in this study. We likely need to gain better insight into relationships between protein-solvent interactions and protein structure peculiarities, in particular given limitations of the used here crystal structures, to be able to construct a model that accurately predicts protein partition behavior across all ATPSs. Copyright © 2016 Elsevier B.V. All rights reserved.
Ban, Yajing; L Prates, Luciana; Yu, Peiqiang
2017-10-18
This study was conducted to (1) determine protein and carbohydrate molecular structure profiles and (2) quantify the relationship between structural features and protein bioavailability of newly developed carinata and canola seeds for dairy cows by using Fourier transform infrared molecular spectroscopy. Results showed similarity in protein structural makeup within the entire protein structural region between carinata and canola seeds. The highest area ratios related to structural CHO, total CHO, and cellulosic compounds were obtained for carinata seeds. Carinata and canola seeds showed similar carbohydrate and protein molecular structures by multivariate analyses. Carbohydrate molecular structure profiles were highly correlated to protein rumen degradation and intestinal digestion characteristics. In conclusion, the molecular spectroscopy can detect inherent structural characteristics in carinata and canola seeds in which carbohydrate-relative structural features are related to protein metabolism and utilization. Protein and carbohydrate spectral profiles could be used as predictors of rumen protein bioavailability in cows.
Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling.
Wang, Juexin; Luttrell, Joseph; Zhang, Ning; Khan, Saad; Shi, NianQing; Wang, Michael X; Kang, Jing-Qiong; Wang, Zheng; Xu, Dong
2016-01-01
Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.
Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen
2008-10-01
In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.
Genshaft, Alexander; Moser, Joe-Ann S.; D'Antonio, Edward L.; Bowman, Christine M.; Christianson, David W.
2013-01-01
The reversible acetylation of lysine to form N6-acetyllysine in the regulation of protein function is a hallmark of epigenetics. Acetylation of the positively charged amino group of the lysine side chain generates a neutral N-alkylacetamide moiety that serves as a molecular “switch” for the modulation of protein function and protein-protein interactions. We now report the analysis of 381 N6-acetyllysine side chain amide conformations as found in 79 protein crystal structures and 11 protein NMR structures deposited in the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics. We find that only 74.3% of N6-acetyllysine residues in protein crystal structures and 46.5% in protein NMR structures contain amide groups with energetically preferred trans or generously trans conformations. Surprisingly, 17.6% of N6-acetyllysine residues in protein crystal structures and 5.3% in protein NMR structures contain amide groups with energetically unfavorable cis or generously cis conformations. Even more surprisingly, 8.1% of N6-acetyllysine residues in protein crystal structures and 48.2% in NMR structures contain amide groups with energetically prohibitive twisted conformations that approach the transition state structure for cis-trans isomerization. In contrast, 109 unique N-alkylacetamide groups contained in 84 highly-accurate small molecule crystal structures retrieved from the Cambridge Structural Database exclusively adopt energetically preferred trans conformations. Therefore, we conclude that cis and twisted N6-acetyllysine amides in protein structures deposited in the PDB are erroneously modeled due to their energetically unfavorable or prohibitive conformations. PMID:23401043
The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis
NASA Astrophysics Data System (ADS)
Suzuki, Tomonori; Miyazaki, Satoru
2011-01-01
Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.
Music, Nedzad; Gagnon, Carl A
2010-12-01
Porcine reproductive and respiratory syndrome (PRRS) is an economically devastating viral disease affecting the swine industry worldwide. The etiological agent, PRRS virus (PRRSV), possesses a RNA viral genome with nine open reading frames (ORFs). The ORF1a and ORF1b replicase-associated genes encode the polyproteins pp1a and pp1ab, respectively. The pp1a is processed in nine non-structural proteins (nsps): nsp1α, nsp1β, and nsp2 to nsp8. Proteolytic cleavage of pp1ab generates products nsp9 to nsp12. The proteolytic pp1a cleavage products process and cleave pp1a and pp1ab into nsp products. The nsp9 to nsp12 are involved in virus genome transcription and replication. The 3' end of the viral genome encodes four minor and three major structural proteins. The GP(2a), GP₃ and GP₄ (encoded by ORF2a, 3 and 4), are glycosylated membrane associated minor structural proteins. The fourth minor structural protein, the E protein (encoded by ORF2b), is an unglycosylated membrane associated protein. The viral envelope contains two major structural proteins: a glycosylated major envelope protein GP₅ (encoded by ORF5) and an unglycosylated membrane M protein (encoded by ORF6). The third major structural protein is the nucleocapsid N protein (encoded by ORF7). All PRRSV non-structural and structural proteins are essential for virus replication, and PRRSV infectivity is relatively intolerant to subtle changes within the structural proteins. PRRSV virulence is multigenic and resides in both the non-structural and structural viral proteins. This review discusses the molecular characteristics, biological and immunological functions of the PRRSV structural and nsps and their involvement in the virus pathogenesis.
Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude
2008-11-15
Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.
New paradigm in ankyrin repeats: Beyond protein-protein interaction module.
Islam, Zeyaul; Nagampalli, Raghavendra Sashi Krishna; Fatima, Munazza Tamkeen; Ashraf, Ghulam Md
2018-04-01
Classically, ankyrin repeat (ANK) proteins are built from tandems of two or more repeats and form curved solenoid structures that are associated with protein-protein interactions. These are short, widespread structural motif of around 33 amino acids repeats in tandem, having a canonical helix-loop-helix fold, found individually or in combination with other domains. The multiplicity of structural pattern enables it to form assemblies of diverse sizes, required for their abilities to confer multiple binding and structural roles of proteins. Three-dimensional structures of these repeats determined to date reveal a degree of structural variability that translates into the considerable functional versatility of this protein superfamily. Recent work on the ANK has proposed novel structural information, especially protein-lipid, protein-sugar and protein-protein interaction. Self-assembly of these repeats was also shown to prevent the associated protein in forming filaments. In this review, we summarize the latest findings and how the new structural information has increased our understanding of the structural determinants of ANK proteins. We discussed latest findings on how these proteins participate in various interactions to diversify the ANK roles in numerous biological processes, and explored the emerging and evolving field of designer ankyrins and its framework for protein engineering emphasizing on biotechnological applications. Copyright © 2017 Elsevier B.V. All rights reserved.
Classification of proteins: available structural space for molecular modeling.
Andreeva, Antonina
2012-01-01
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
NASA Astrophysics Data System (ADS)
Bozkurt, Ozlem; Haman Bayari, Sevgi; Severcan, Mete; Krafft, Christoph; Popp, Jürgen; Severcan, Feride
2012-07-01
The relation between protein structural alterations and tissue dysfunction is a major concern as protein fibrillation and/or aggregation due to structural alterations has been reported in many disease states. In the current study, Fourier transform infrared microspectroscopic imaging has been used to investigate diabetes-induced changes on protein secondary structure and macromolecular content in streptozotocin-induced diabetic rat liver. Protein secondary structural alterations were predicted using neural network approach utilizing the amide I region. Moreover, the role of selenium in the recovery of diabetes-induced alterations on macromolecular content and protein secondary structure was also studied. The results revealed that diabetes induced a decrease in lipid to protein and glycogen to protein ratios in diabetic livers. Significant alterations in protein secondary structure were observed with a decrease in α-helical and an increase in β-sheet content. Both doses of selenium restored diabetes-induced changes in lipid to protein and glycogen to protein ratios. However, low-dose selenium supplementation was not sufficient to recover the effects of diabetes on protein secondary structure, while a higher dose of selenium fully restored diabetes-induced alterations in protein structure.
Gaia: automated quality assessment of protein structure models.
Kota, Pradeep; Ding, Feng; Ramachandran, Srinivas; Dokholyan, Nikolay V
2011-08-15
Increasing use of structural modeling for understanding structure-function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures. In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement. We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures. dokh@unc.edu Supplementary data are available at Bioinformatics online.
Protein Structure Prediction by Protein Threading
NASA Astrophysics Data System (ADS)
Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong
The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.
Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke
2014-01-01
The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
2014-01-01
Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993
Andreeva, Antonina
2016-06-15
The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
SDSL-ESR-based protein structure characterization.
Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A
2010-03-01
As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Protein enriched pasta: structure and digestibility of its protein network.
Laleg, Karima; Barron, Cécile; Santé-Lhoutellier, Véronique; Walrand, Stéphane; Micard, Valérie
2016-02-01
Wheat (W) pasta was enriched in 6% gluten (G), 35% faba (F) or 5% egg (E) to increase its protein content (13% to 17%). The impact of the enrichment on the multiscale structure of the pasta and on in vitro protein digestibility was studied. Increasing the protein content (W- vs. G-pasta) strengthened pasta structure at molecular and macroscopic scales but reduced its protein digestibility by 3% by forming a higher covalently linked protein network. Greater changes in the macroscopic and molecular structure of the pasta were obtained by varying the nature of protein used for enrichment. Proteins in G- and E-pasta were highly covalently linked (28-32%) resulting in a strong pasta structure. Conversely, F-protein (98% SDS-soluble) altered the pasta structure by diluting gluten and formed a weak protein network (18% covalent link). As a result, protein digestibility in F-pasta was significantly higher (46%) than in E- (44%) and G-pasta (39%). The effect of low (55 °C, LT) vs. very high temperature (90 °C, VHT) drying on the protein network structure and digestibility was shown to cause greater molecular changes than pasta formulation. Whatever the pasta, a general strengthening of its structure, a 33% to 47% increase in covalently linked proteins and a higher β-sheet structure were observed. However, these structural differences were evened out after the pasta was cooked, resulting in identical protein digestibility in LT and VHT pasta. Even after VHT drying, F-pasta had the best amino acid profile with the highest protein digestibility, proof of its nutritional interest.
Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin
2016-06-15
Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Functional assignment to JEV proteins using SVM.
Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep
2008-01-01
Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP).
Functional assignment to JEV proteins using SVM
Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep
2008-01-01
Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP). PMID:19052658
NASA Astrophysics Data System (ADS)
Finkelstein, A. V.; Galzitskaya, O. V.
2004-04-01
Protein physics is grounded on three fundamental experimental facts: protein, this long heteropolymer, has a well defined compact three-dimensional structure; this structure can spontaneously arise from the unfolded protein chain in appropriate environment; and this structure is separated from the unfolded state of the chain by the “all-or-none” phase transition, which ensures robustness of protein structure and therefore of its action. The aim of this review is to consider modern understanding of physical principles of self-organization of protein structures and to overview such important features of this process, as finding out the unique protein structure among zillions alternatives, nucleation of the folding process and metastable folding intermediates. Towards this end we will consider the main experimental facts and simple, mostly phenomenological theoretical models. We will concentrate on relatively small (single-domain) water-soluble globular proteins (whose structure and especially folding are much better studied and understood than those of large or membrane and fibrous proteins) and consider kinetic and structural aspects of transition of initially unfolded protein chains into their final solid (“native”) 3D structures.
General overview on structure prediction of twilight-zone proteins.
Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew
2015-09-04
Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.
Structure-Based Characterization of Multiprotein Complexes
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.
2014-01-01
Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
Mathematical methods for protein science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hart, W.; Istrail, S.; Atkins, J.
1997-12-31
Understanding the structure and function of proteins is a fundamental endeavor in molecular biology. Currently, over 100,000 protein sequences have been determined by experimental methods. The three dimensional structure of the protein determines its function, but there are currently less than 4,000 structures known to atomic resolution. Accordingly, techniques to predict protein structure from sequence have an important role in aiding the understanding of the Genome and the effects of mutations in genetic disease. The authors describe current efforts at Sandia to better understand the structure of proteins through rigorous mathematical analyses of simple lattice models. The efforts have focusedmore » on two aspects of protein science: mathematical structure prediction, and inverse protein folding.« less
Resource for structure related information on transmembrane proteins
NASA Astrophysics Data System (ADS)
Tusnády, Gábor E.; Simon, István
Transmembrane proteins are involved in a wide variety of vital biological processes including transport of water-soluble molecules, flow of information and energy production. Despite significant efforts to determine the structures of these proteins, only a few thousand solved structures are known so far. Here, we review the various resources for structure-related information on these types of proteins ranging from the 3D structure to the topology and from the up-to-date databases to the various Internet sites and servers dealing with structure prediction and structure analysis. Abbreviations: 3D, three dimensional; PDB, Protein Data Bank; TMP, transmembrane protein.
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
A structural-alphabet-based strategy for finding structural motifs across protein families
Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay
2010-01-01
Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
An overview of the structures of protein-DNA complexes
Luscombe, Nicholas M; Austin, Susan E; Berman , Helen M; Thornton, Janet M
2000-01-01
On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes. PMID:11104519
Kinjo, Akira R; Nakamura, Haruki
2013-01-01
Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.
Course 12: Proteins: Structural, Thermodynamic and Kinetic Aspects
NASA Astrophysics Data System (ADS)
Finkelstein, A. V.
1 Introduction 2 Overview of protein architectures and discussion of physical background of their natural selection 2.1 Protein structures 2.2 Physical selection of protein structures 3 Thermodynamic aspects of protein folding 3.1 Reversible denaturation of protein structures 3.2 What do denatured proteins look like? 3.3 Why denaturation of a globular protein is the first-order phase transition 3.4 "Gap" in energy spectrum: The main characteristic that distinguishes protein chains from random polymers 4 Kinetic aspects of protein folding 4.1 Protein folding in vivo 4.2 Protein folding in vitro (in the test-tube) 4.3 Theory of protein folding rates and solution of the Levinthal paradox
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka
2012-02-27
ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Protein domain assignment from the recurrence of locally similar structures
Tai, Chin-Hsien; Sam, Vichetra; Gibrat, Jean-Francois; Garnier, Jean; Munson, Peter J.
2010-01-01
Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6,373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies. PMID:21287617
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms. PMID:27517583
Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.
Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming
2016-07-01
Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
Structural determination of intact proteins using mass spectrometry
Kruppa, Gary [San Francisco, CA; Schoeniger, Joseph S [Oakland, CA; Young, Malin M [Livermore, CA
2008-05-06
The present invention relates to novel methods of determining the sequence and structure of proteins. Specifically, the present invention allows for the analysis of intact proteins within a mass spectrometer. Therefore, preparatory separations need not be performed prior to introducing a protein sample into the mass spectrometer. Also disclosed herein are new instrumental developments for enhancing the signal from the desired modified proteins, methods for producing controlled protein fragments in the mass spectrometer, eliminating complex microseparations, and protein preparatory chemical steps necessary for cross-linking based protein structure determination.Additionally, the preferred method of the present invention involves the determination of protein structures utilizing a top-down analysis of protein structures to search for covalent modifications. In the preferred method, intact proteins are ionized and fragmented within the mass spectrometer.
Understand protein functions by comparing the similarity of local structural environments.
Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao
2017-02-01
The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.
Zhang, Gaihua; Su, Zhen
2012-01-01
Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.
Structure-based characterization of multiprotein complexes.
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J
2014-07-08
Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
An ambiguity principle for assigning protein structural domains.
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object-in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.
Ensemble-based evaluation for protein structure models.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2016-06-15
Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts' intuitive assessment of computational models and provides information of practical usefulness of models. https://bitbucket.org/mjamroz/flexscore dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Ensemble-based evaluation for protein structure models
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2016-01-01
Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633
Sinz, Andrea
2018-05-28
Structural mass spectrometry (MS) is gaining increasing importance for deriving valuable three-dimensional structural information on proteins and protein complexes, and it complements existing techniques, such as NMR spectroscopy and X-ray crystallography. Structural MS unites different MS-based techniques, such as hydrogen/deuterium exchange, native MS, ion-mobility MS, protein footprinting, and chemical cross-linking/MS, and it allows fundamental questions in structural biology to be addressed. In this Minireview, I will focus on the cross-linking/MS strategy. This method not only delivers tertiary structural information on proteins, but is also increasingly being used to decipher protein interaction networks, both in vitro and in vivo. Cross-linking/MS is currently one of the most promising MS-based approaches to derive structural information on very large and transient protein assemblies and intrinsically disordered proteins. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Fourier-based classification of protein secondary structures.
Shu, Jian-Jun; Yong, Kian Yan
2017-04-15
The correct prediction of protein secondary structures is one of the key issues in predicting the correct protein folded shape, which is used for determining gene function. Existing methods make use of amino acids properties as indices to classify protein secondary structures, but are faced with a significant number of misclassifications. The paper presents a technique for the classification of protein secondary structures based on protein "signal-plotting" and the use of the Fourier technique for digital signal processing. New indices are proposed to classify protein secondary structures by analyzing hydrophobicity profiles. The approach is simple and straightforward. Results show that the more types of protein secondary structures can be classified by means of these newly-proposed indices. Copyright © 2017 Elsevier Inc. All rights reserved.
ProTSAV: A protein tertiary structure analysis and validation server.
Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B
2016-01-01
Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.
Recent developments in structural proteomics for protein structure determination.
Liu, Hsuan-Liang; Hsu, Jyh-Ping
2005-05-01
The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.
Modularity in protein structures: study on all-alpha proteins.
Khan, Taushif; Ghosh, Indira
2015-01-01
Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.
An approach to large scale identification of non-obvious structural similarities between proteins
Cherkasov, Artem; Jones, Steven JM
2004-01-01
Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578
Structure and function of seed storage proteins in faba bean (Vicia faba L.).
Liu, Yujiao; Wu, Xuexia; Hou, Wanwei; Li, Ping; Sha, Weichao; Tian, Yingying
2017-05-01
The protein subunit is the most important basic unit of protein, and its study can unravel the structure and function of seed storage proteins in faba bean. In this study, we identified six specific protein subunits in Faba bean (cv. Qinghai 13) combining liquid chromatography (LC), liquid chromatography-electronic spray ionization mass (LC-ESI-MS/MS) and bio-information technology. The results suggested a diversity of seed storage proteins in faba bean, and a total of 16 proteins (four GroEL molecular chaperones and 12 plant-specific proteins) were identified from 97-, 96-, 64-, 47-, 42-, and 38-kD-specific protein subunits in faba bean based on the peptide sequence. We also analyzed the composition and abundance of the amino acids, the physicochemical characteristics, secondary structure, three-dimensional structure, transmembrane domain, and possible subcellular localization of these identified proteins in faba bean seed, and finally predicted function and structure. The three-dimensional structures were generated based on homologous modeling, and the protein function was analyzed based on the annotation from the non-redundant protein database (NR database, NCBI) and function analysis of optimal modeling. The objective of this study was to identify the seed storage proteins in faba bean and confirm the structure and function of these proteins. Our results can be useful for the study of protein nutrition and achieve breeding goals for optimal protein quality in faba bean.
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras
2014-03-11
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
USDA-ARS?s Scientific Manuscript database
The Rift Valley fever virus (RVFV) encodes structural proteins, nucleoprotein (N), N-terminus glycoprotein (Gn), C-terminus glycoprotein (Gc) and L protein, 78-kDa and non-structural proteins NSm and NSs. Using the baculovirus system we expressed the full-length coding sequence of N, NSs, NSm, Gc an...
Adjusting protein graphs based on graph entropy.
Peng, Sheng-Lung; Tsay, Yu-Wei
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Adjusting protein graphs based on graph entropy
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid. PMID:25474347
Advances in Homology Protein Structure Modeling
Xiang, Zhexin
2007-01-01
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261
Structural deformation upon protein-protein interaction: A structural alphabet approach
Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude
2008-01-01
Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking. PMID:18307769
Structural deformation upon protein-protein interaction: a structural alphabet approach.
Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude
2008-02-28
In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Our study provides qualitative information about induced fit. These results could be of help for flexible docking.
Life in the fast lane for protein crystallization and X-ray crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2005-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2004-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
PDBFlex: exploring flexibility in protein structures
Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam
2016-01-01
The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193
An ambiguity principle for assigning protein structural domains
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object—in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our “multipartitioning” approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules. PMID:28097215
Unraveling the meaning of chemical shifts in protein NMR.
Berjanskii, Mark V; Wishart, David S
2017-11-01
Chemical shifts are among the most informative parameters in protein NMR. They provide wealth of information about protein secondary and tertiary structure, protein flexibility, and protein-ligand binding. In this report, we review the progress in interpreting and utilizing protein chemical shifts that has occurred over the past 25years, with a particular focus on the large body of work arising from our group and other Canadian NMR laboratories. More specifically, this review focuses on describing, assessing, and providing some historical context for various chemical shift-based methods to: (1) determine protein secondary and super-secondary structure; (2) derive protein torsion angles; (3) assess protein flexibility; (4) predict residue accessible surface area; (5) refine 3D protein structures; (6) determine 3D protein structures and (7) characterize intrinsically disordered proteins. This review also briefly covers some of the methods that we previously developed to predict chemical shifts from 3D protein structures and/or protein sequence data. It is hoped that this review will help to increase awareness of the considerable utility of NMR chemical shifts in structural biology and facilitate more widespread adoption of chemical-shift based methods by the NMR spectroscopists, structural biologists, protein biophysicists, and biochemists worldwide. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.
Efficient protein structure search using indexing methods
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.
Kim, Sungchul; Sael, Lee; Yu, Hwanjo
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
Structure-based barcoding of proteins.
Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin
2014-01-01
A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Modeling complexes of modeled proteins.
Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A
2017-03-01
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Salvage of failed protein targets by reductive alkylation.
Tan, Kemin; Kim, Youngchang; Hatzos-Skintges, Catherine; Chang, Changsoo; Cuff, Marianne; Chhor, Gekleng; Osipiuk, Jerzy; Michalska, Karolina; Nocek, Boguslaw; An, Hao; Babnigg, Gyorgy; Bigelow, Lance; Joachimiak, Grazyna; Li, Hui; Mack, Jamey; Makowska-Grzyska, Magdalena; Maltseva, Natalia; Mulligan, Rory; Tesar, Christine; Zhou, Min; Joachimiak, Andrzej
2014-01-01
The growth of diffraction-quality single crystals is of primary importance in protein X-ray crystallography. Chemical modification of proteins can alter their surface properties and crystallization behavior. The Midwest Center for Structural Genomics (MCSG) has previously reported how reductive methylation of lysine residues in proteins can improve crystallization of unique proteins that initially failed to produce diffraction-quality crystals. Recently, this approach has been expanded to include ethylation and isopropylation in the MCSG protein crystallization pipeline. Applying standard methods, 180 unique proteins were alkylated and screened using standard crystallization procedures. Crystal structures of 12 new proteins were determined, including the first ethylated and the first isopropylated protein structures. In a few cases, the structures of native and methylated or ethylated states were obtained and the impact of reductive alkylation of lysine residues was assessed. Reductive methylation tends to be more efficient and produces the most alkylated protein structures. Structures of methylated proteins typically have higher resolution limits. A number of well-ordered alkylated lysine residues have been identified, which make both intermolecular and intramolecular contacts. The previous report is updated and complemented with the following new data; a description of a detailed alkylation protocol with results, structural features, and roles of alkylated lysine residues in protein crystals. These contribute to improved crystallization properties of some proteins.
Salvage of Failed Protein Targets by Reductive Alkylation
Tan, Kemin; Kim, Youngchang; Hatzos-Skintges, Catherine; Chang, Changsoo; Cuff, Marianne; Chhor, Gekleng; Osipiuk, Jerzy; Michalska, Karolina; Nocek, Boguslaw; An, Hao; Babnigg, Gyorgy; Bigelow, Lance; Joachimiak, Grazyna; Li, Hui; Mack, Jamey; Makowska-Grzyska, Magdalena; Maltseva, Natalia; Mulligan, Rory; Tesar, Christine; Zhou, Min; Joachimiak, Andrzej
2014-01-01
The growth of diffraction-quality single crystals is of primary importance in protein X-ray crystallography. Chemical modification of proteins can alter their surface properties and crystallization behavior. The Midwest Center for Structural Genomics (MCSG) has previously reported how reductive methylation of lysine residues in proteins can improve crystallization of unique proteins that initially failed to produce diffraction-quality crystals. Recently, this approach has been expanded to include ethylation and isopropylation in the MCSG protein crystallization pipeline. Applying standard methods, 180 unique proteins were alkylated and screened using standard crystallization procedures. Crystal structures of 12 new proteins were determined, including the first ethylated and the first isopropylated protein structures. In a few cases, the structures of native and methylated or ethylated states were obtained and the impact of reductive alkylation of lysine residues was assessed. Reductive methylation tends to be more efficient and produces the most alkylated protein structures. Structures of methylated proteins typically have higher resolution limits. A number of well-ordered alkylated lysine residues have been identified, which make both intermolecular and intramolecular contacts. The previous report is updated and complemented with the following new data; a description of a detailed alkylation protocol with results, structural features, and roles of alkylated lysine residues in protein crystals. These contribute to improved crystallization properties of some proteins. PMID:24590719
Dynamic New World: Refining Our View of Protein Structure, Function and Evolution
Mannige, Ranjan V.
2014-01-01
Proteins are crucial to the functioning of all lifeforms. Traditional understanding posits that a single protein occupies a single structure (“fold”), which performs a single function. This view is radically challenged with the recognition that high structural dynamism—the capacity to be extra “floppy”—is more prevalent in functional proteins than previously assumed. As reviewed here, this dynamic take on proteins affects our understanding of protein “structure”, function, and evolution, and even gives us a glimpse into protein origination. Specifically, this review will discuss historical developments concerning protein structure, and important new relationships between dynamism and aspects of protein sequence, structure, binding modes, binding promiscuity, evolvability, and origination. Along the way, suggestions will be provided for how key parts of textbook definitions—that so far have excluded membership to intrinsically disordered proteins (IDPs)—could be modified to accommodate our more dynamic understanding of proteins. PMID:28250374
Scheraga, H A; Paine, G H
1986-01-01
We are using a variety of theoretical and computational techniques to study protein structure, protein folding, and higher-order structures. Our earlier work involved treatments of liquid water and aqueous solutions of nonpolar and polar solutes, computations of the stabilities of the fundamental structures of proteins and their packing arrangements, conformations of small cyclic and open-chain peptides, structures of fibrous proteins (collagen), structures of homologous globular proteins, introduction of special procedures as constraints during energy minimization of globular proteins, and structures of enzyme-substrate complexes. Recently, we presented a new methodology for predicting polypeptide structure (described here); the method is based on the calculation of the probable and average conformation of a polypeptide chain by the application of equilibrium statistical mechanics in conjunction with an adaptive, importance sampling Monte Carlo algorithm. As a test, it was applied to Met-enkephalin.
Neutron protein crystallography: A complementary tool for locating hydrogens in proteins.
O'Dell, William B; Bodenheimer, Annette M; Meilleur, Flora
2016-07-15
Neutron protein crystallography is a powerful tool for investigating protein chemistry because it directly locates hydrogen atom positions in a protein structure. The visibility of hydrogen and deuterium atoms arises from the strong interaction of neutrons with the nuclei of these isotopes. Positions can be unambiguously assigned from diffraction at resolutions typical of protein crystals. Neutrons have the additional benefit to structural biology of not inducing radiation damage in protein crystals. The same crystal could be measured multiple times for parametric studies. Here, we review the basic principles of neutron protein crystallography. The information that can be gained from a neutron structure is presented in balance with practical considerations. Methods to produce isotopically-substituted proteins and to grow large crystals are provided in the context of neutron structures reported in the literature. Available instruments for data collection and software for data processing and structure refinement are described along with technique-specific strategies including joint X-ray/neutron structure refinement. Examples are given to illustrate, ultimately, the unique scientific value of neutron protein crystal structures. Copyright © 2015 Elsevier Inc. All rights reserved.
Knowledge-based prediction of protein backbone conformation using a structural alphabet.
Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard
2017-01-01
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Improved in-cell structure determination of proteins at near-physiological concentration
Ikeya, Teppei; Hanashima, Tomomi; Hosoya, Saori; Shimazaki, Manato; Ikeda, Shiro; Mishima, Masaki; Güntert, Peter; Ito, Yutaka
2016-01-01
Investigating three-dimensional (3D) structures of proteins in living cells by in-cell nuclear magnetic resonance (NMR) spectroscopy opens an avenue towards understanding the structural basis of their functions and physical properties under physiological conditions inside cells. In-cell NMR provides data at atomic resolution non-invasively, and has been used to detect protein-protein interactions, thermodynamics of protein stability, the behavior of intrinsically disordered proteins, etc. in cells. However, so far only a single de novo 3D protein structure could be determined based on data derived only from in-cell NMR. Here we introduce methods that enable in-cell NMR protein structure determination for a larger number of proteins at concentrations that approach physiological ones. The new methods comprise (1) advances in the processing of non-uniformly sampled NMR data, which reduces the measurement time for the intrinsically short-lived in-cell NMR samples, (2) automatic chemical shift assignment for obtaining an optimal resonance assignment, and (3) structure refinement with Bayesian inference, which makes it possible to calculate accurate 3D protein structures from sparse data sets of conformational restraints. As an example application we determined the structure of the B1 domain of protein G at about 250 μM concentration in living E. coli cells. PMID:27910948
Structure-sequence based analysis for identification of conserved regions in proteins
Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth
2013-05-28
Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.
Use of designed sequences in protein structure recognition.
Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran
2018-05-09
Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
Homology modeling a fast tool for drug discovery: current perspectives.
Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives
Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616
Matveev, Vladimir V
2010-06-09
According to the hypothesis explored in this paper, native aggregation is genetically controlled (programmed) reversible aggregation that occurs when interacting proteins form new temporary structures through highly specific interactions. It is assumed that Anfinsen's dogma may be extended to protein aggregation: composition and amino acid sequence determine not only the secondary and tertiary structure of single protein, but also the structure of protein aggregates (associates). Cell function is considered as a transition between two states (two states model), the resting state and state of activity (this applies to the cell as a whole and to its individual structures). In the resting state, the key proteins are found in the following inactive forms: natively unfolded and globular. When the cell is activated, secondary structures appear in natively unfolded proteins (including unfolded regions in other proteins), and globular proteins begin to melt and their secondary structures become available for interaction with the secondary structures of other proteins. These temporary secondary structures provide a means for highly specific interactions between proteins. As a result, native aggregation creates temporary structures necessary for cell activity."One of the principal objects of theoretical research in any department of knowledge is to find the point of view from which the subject appears in its greatest simplicity."Josiah Willard Gibbs (1839-1903).
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Direct protein photoinduced conformational changes using porphyrins.
NASA Astrophysics Data System (ADS)
Brancaleon, Lorenzo; Silva, Ivan; Fernandez, Nicholas; Johnson, Eric; Sansone, Samuel
2008-03-01
Most proteins functions depend on the interaction with other ligands. These interactions depend on uniquely structured binding sites formed by the folding of the proteins. Ligands can often prompt intended as well as ``accidental'' protein structural changes. One can foresee that the ability to prompt and control post-translational protein folding could be a powerful tool to investigate protein folding mechanisms but also to inhibit certain proteins or induce new properties to proteins. One possible way to produce such structural disruption is the combination of light and photoactive ligands. This option has been investigated in recent years by exploiting photoisomerization and other properties of non-physiological dyes. We used an alternative approach which uses porphyrins as the ``triggers'' of structural changes. The advantage of porphyrins is that they can be found naturally in living cells. The photophysical properties of porphyrins can induce local as well as long range effects on the structure of the bound protein. Porphyrins are known to produce structural changes in porphyrin-specific proteins, however the novelty of our results is that we demonstrated that these dyes can also produce structural changes in non-porphyrin-specific globular proteins. We will present an overview of our research to-date in this field and its potential applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu,P.
2007-01-01
Studying the secondary structure of proteins leads to an understanding of the components that make up a whole protein, and such an understanding of the structure of the whole protein is often vital to understanding its digestive behaviour and nutritive value in animals. The main protein secondary structures are the {alpha}-helix and {beta}-sheet. The percentage of these two structures in protein secondary structures influences protein nutritive value, quality and digestive behaviour. A high percentage of {beta}-sheet structure may partly cause a low access to gastrointestinal digestive enzymes, which results in a low protein value. The objectives of the present studymore » were to use advanced synchrotron-based Fourier transform IR (S-FTIR) microspectroscopy as a new approach to reveal the molecular chemistry of the protein secondary structures of feed tissues affected by heat-processing within intact tissue at a cellular level, and to quantify protein secondary structures using multicomponent peak modelling Gaussian and Lorentzian methods, in relation to protein digestive behaviours and nutritive value in the rumen, which was determined using the Cornell Net Carbohydrate Protein System. The synchrotron-based molecular chemistry research experiment was performed at the National Synchrotron Light Source at Brookhaven National Laboratory, US Department of Energy. The results showed that, with S-FTIR microspectroscopy, the molecular chemistry, ultrastructural chemical make-up and nutritive characteristics could be revealed at a high ultraspatial resolution ({approx}10 {mu}m). S-FTIR microspectroscopy revealed that the secondary structure of protein differed between raw and roasted golden flaxseeds in terms of the percentages and ratio of {alpha}-helixes and {beta}-sheets in the mid-IR range at the cellular level. By using multicomponent peak modelling, the results show that the roasting reduced (P <0.05) the percentage of {alpha}-helixes (from 47.1% to 36.1%: S-FTIR absorption intensity), increased the percentage of {beta}-sheets (from 37.2% to 49.8%: S-FTIR absorption intensity) and reduced the {alpha}-helix to {beta}-sheet ratio (from 0.3 to 0.7) in the golden flaxseeds, which indicated a negative effect of the roasting on protein values, utilisation and bioavailability. These results were proved by the Cornell Net Carbohydrate Protein System in situ animal trial, which also revealed that roasting increased the amount of protein bound to lignin, and well as of the Maillard reaction protein (both of which are poorly used by ruminants), and increased the level of indigestible and undegradable protein in ruminants. The present results demonstrate the potential of highly spatially resolved synchrotron-based infrared microspectroscopy to locate 'pure' protein in feed tissues, and reveal protein secondary structures and digestive behaviour, making a significant step forward in and an important contribution to protein nutritional research. Further study is needed to determine the sensitivities of protein secondary structures to various heat-processing conditions, and to quantify the relationship between protein secondary structures and the nutrient availability and digestive behaviour of various protein sources. Information from the present study arising from the synchrotron-based IR probing of the protein secondary structures of protein sources at the cellular level will be valuable as a guide to maintaining protein quality and predicting digestive behaviours.« less
Density functional study of molecular interactions in secondary structures of proteins.
Takano, Yu; Kusaka, Ayumi; Nakamura, Haruki
2016-01-01
Proteins play diverse and vital roles in biology, which are dominated by their three-dimensional structures. The three-dimensional structure of a protein determines its functions and chemical properties. Protein secondary structures, including α-helices and β-sheets, are key components of the protein architecture. Molecular interactions, in particular hydrogen bonds, play significant roles in the formation of protein secondary structures. Precise and quantitative estimations of these interactions are required to understand the principles underlying the formation of three-dimensional protein structures. In the present study, we have investigated the molecular interactions in α-helices and β-sheets, using ab initio wave function-based methods, the Hartree-Fock method (HF) and the second-order Møller-Plesset perturbation theory (MP2), density functional theory, and molecular mechanics. The characteristic interactions essential for forming the secondary structures are discussed quantitatively.
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Fast protein tertiary structure retrieval based on global surface shape similarity.
Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke
2008-09-01
Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.
Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E; Liu, Xiang-Qin; Rainey, Jan K
2016-12-01
Membrane proteins are still heavily under-represented in the protein data bank (PDB), owing to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles, owing to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and (or) amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through the introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10-15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM).
Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E.; Liu, Xiang-Qin; Rainey, Jan K.
2017-01-01
Membrane proteins are still heavily underrepresented in the protein data bank (PDB) due to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles due to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and/or amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10–15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM). PMID:27010607
Gabanyi, Margaret J; Adams, Paul D; Arnold, Konstantin; Bordoli, Lorenza; Carter, Lester G; Flippen-Andersen, Judith; Gifford, Lida; Haas, Juergen; Kouranov, Andrei; McLaughlin, William A; Micallef, David I; Minor, Wladek; Shah, Raship; Schwede, Torsten; Tao, Yi-Ping; Westbrook, John D; Zimmerman, Matthew; Berman, Helen M
2011-07-01
The Protein Structure Initiative's Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org ) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI's high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
An Interactive Introduction to Protein Structure
ERIC Educational Resources Information Center
Lee, W. Theodore
2004-01-01
To improve student understanding of protein structure and the significance of noncovalent interactions in protein structure and function, students are assigned a project to write a paper complemented with computer-generated images. The assignment provides an opportunity for students to select a protein structure that is of interest and detail…
Serrano, Pedro; Dutta, Samit K; Proudfoot, Andrew; Mohanty, Biswaranjan; Susac, Lukas; Martin, Bryan; Geralt, Michael; Jaroszewski, Lukasz; Godzik, Adam; Elsliger, Marc; Wilson, Ian A; Wüthrich, Kurt
2016-11-01
For more than a decade, the Joint Center for Structural Genomics (JCSG; www.jcsg.org) worked toward increased three-dimensional structure coverage of the protein universe. This coordinated quest was one of the main goals of the four high-throughput (HT) structure determination centers of the Protein Structure Initiative (PSI; www.nigms.nih.gov/Research/specificareas/PSI). To achieve the goals of the PSI, the JCSG made use of the complementarity of structure determination by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to increase and diversify the range of targets entering the HT structure determination pipeline. The overall strategy, for both techniques, was to determine atomic resolution structures for representatives of large protein families, as defined by the Pfam database, which had no structural coverage and could make significant contributions to biological and biomedical research. Furthermore, the experimental structures could be leveraged by homology modeling to further expand the structural coverage of the protein universe and increase biological insights. Here, we describe what could be achieved by this structural genomics approach, using as an illustration the contributions from 20 NMR structure determinations out of a total of 98 JCSG NMR structures, which were selected because they are the first three-dimensional structure representations of the respective Pfam protein families. The information from this small sample is representative for the overall results from crystal and NMR structure determination in the JCSG. There are five new folds, which were classified as domains of unknown functions (DUF), three of the proteins could be functionally annotated based on three-dimensional structure similarity with previously characterized proteins, and 12 proteins showed only limited similarity with previous deposits in the Protein Data Bank (PDB) and were classified as DUFs. © 2016 Federation of European Biochemical Societies.
Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine
2010-08-01
The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Evaluation of variability in high-resolution protein structures by global distance scoring.
Anzai, Risa; Asami, Yoshiki; Inoue, Waka; Ueno, Hina; Yamada, Koya; Okada, Tetsuji
2018-01-01
Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.
Rincon, Sergio A; Paoletti, Anne
2016-01-01
Unveiling the function of a novel protein is a challenging task that requires careful experimental design. Yeast cytokinesis is a conserved process that involves modular structural and regulatory proteins. For such proteins, an important step is to identify their domains and structural organization. Here we briefly discuss a collection of methods commonly used for sequence alignment and prediction of protein structure that represent powerful tools for the identification homologous domains and design of structure-function approaches to test experimentally the function of multi-domain proteins such as those implicated in yeast cytokinesis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu,P.
2007-01-01
The utilization and availability of protein depended on the types of protein and their specific susceptibility to enzymatic hydrolysis (inhibitory activities) in the gastrointestine and was highly associated with protein molecular structures. Studying internal protein structure and protein subfraction profiles leaded to an understanding of the components that make up a whole protein. An understanding of the molecular structure of the whole protein was often vital to understanding its digestive behavior and nutritive value in animals. In this review, recently obtained information on protein molecular structural effects of heat processing was reviewed, in relation to protein characteristics affecting digestive behaviormore » and nutrient utilization and availability. The emphasis of this review was on (1) using the newly advanced synchrotron technology (S-FTIR) as a novel approach to reveal protein molecular chemistry affected by heat processing within intact plant tissues; (2) revealing the effects of heat processing on the profile changes of protein subfractions associated with digestive behaviors and kinetics manipulated by heat processing; (3) prediction of the changes of protein availability and supply after heat processing, using the advanced DVE/OEB and NRC-2001 models, and (4) obtaining information on optimal processing conditions of protein as intestinal protein source to achieve target values for potential high net absorbable protein in the small intestine. The information described in this article may give better insight in the mechanisms involved and the intrinsic protein molecular structural changes occurring upon processing.« less
Quality assessment of protein model-structures based on structural and functional similarities.
Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata
2012-09-21
Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
Objective identification of residue ranges for the superposition of protein structures
2011-01-01
Background The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures. PMID:21592348
Protein-Protein Docking in Drug Design and Discovery.
Kaczor, Agnieszka A; Bartuzi, Damian; Stępniewski, Tomasz Maciej; Matosiuk, Dariusz; Selent, Jana
2018-01-01
Protein-protein interactions (PPIs) are responsible for a number of key physiological processes in the living cells and underlie the pathomechanism of many diseases. Nowadays, along with the concept of so-called "hot spots" in protein-protein interactions, which are well-defined interface regions responsible for most of the binding energy, these interfaces can be targeted with modulators. In order to apply structure-based design techniques to design PPIs modulators, a three-dimensional structure of protein complex has to be available. In this context in silico approaches, in particular protein-protein docking, are a valuable complement to experimental methods for elucidating 3D structure of protein complexes. Protein-protein docking is easy to use and does not require significant computer resources and time (in contrast to molecular dynamics) and it results in 3D structure of a protein complex (in contrast to sequence-based methods of predicting binding interfaces). However, protein-protein docking cannot address all the aspects of protein dynamics, in particular the global conformational changes during protein complex formation. In spite of this fact, protein-protein docking is widely used to model complexes of water-soluble proteins and less commonly to predict structures of transmembrane protein assemblies, including dimers and oligomers of G protein-coupled receptors (GPCRs). In this chapter we review the principles of protein-protein docking, available algorithms and software and discuss the recent examples, benefits, and drawbacks of protein-protein docking application to water-soluble proteins, membrane anchoring and transmembrane proteins, including GPCRs.
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
Residue-residue contacts: application to analysis of secondary structure interactions.
Potapov, Vladimir; Edelman, Marvin; Sobolev, Vladimir
2013-01-01
Protein structures and their complexes are formed and stabilized by interactions, both inside and outside of the protein. Analysis of such interactions helps in understanding different levels of structures (secondary, super-secondary, and oligomeric states). It can also assist molecular biologists in understanding structural consequences of modifying proteins and/or ligands. In this chapter, our definition of atom-atom and residue-residue contacts is described and applied to analysis of protein-protein interactions in dimeric β-sandwich proteins.
HDAPD: a web tool for searching the disease-associated protein structures
2010-01-01
Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919
Template-based structure modeling of protein-protein interactions
Szilagyi, Andras; Zhang, Yang
2014-01-01
The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449
Challenges in NMR-based structural genomics
NASA Astrophysics Data System (ADS)
Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang
2005-05-01
Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras
2014-01-01
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391
Yeh, Chun-Ting; Brunette, T J; Baker, David; McIntosh-Smith, Simon; Parmeggiani, Fabio
2018-02-01
Computational protein design methods have enabled the design of novel protein structures, but they are often still limited to small proteins and symmetric systems. To expand the size of designable proteins while controlling the overall structure, we developed Elfin, a genetic algorithm for the design of novel proteins with custom shapes using structural building blocks derived from experimentally verified repeat proteins. By combining building blocks with compatible interfaces, it is possible to rapidly build non-symmetric large structures (>1000 amino acids) that match three-dimensional geometric descriptions provided by the user. A run time of about 20min on a laptop computer for a 3000 amino acid structure makes Elfin accessible to users with limited computational resources. Protein structures with controlled geometry will allow the systematic study of the effect of spatial arrangement of enzymes and signaling molecules, and provide new scaffolds for functional nanomaterials. Copyright © 2017 Elsevier Inc. All rights reserved.
Relation between native ensembles and experimental structures of proteins
Best, Robert B.; Lindorff-Larsen, Kresten; DePristo, Mark A.; Vendruscolo, Michele
2006-01-01
Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble. PMID:16829580
Ikeya, Teppei; Terauchi, Tsutomu; Güntert, Peter; Kainosho, Masatsune
2006-07-01
Recently we have developed the stereo-array isotope labeling (SAIL) technique to overcome the conventional molecular size limitation in NMR protein structure determination by employing complete stereo- and regiospecific patterns of stable isotopes. SAIL sharpens signals and simplifies spectra without the loss of requisite structural information, thus making large classes of proteins newly accessible to detailed solution structure determination. The automated structure calculation program CYANA can efficiently analyze SAIL-NOESY spectra and calculate structures without manual analysis. Nevertheless, the original SAIL method might not be capable of determining the structures of proteins larger than 50 kDa or membrane proteins, for which the spectra are characterized by many broadened and overlapped peaks. Here we have carried out simulations of new SAIL patterns optimized for minimal relaxation and overlap, to evaluate the combined use of SAIL and CYANA for solving the structures of larger proteins and membrane proteins. The modified approach reduces the number of peaks to nearly half of that observed with uniform labeling, while still yielding well-defined structures and is expected to enable NMR structure determinations of these challenging systems.
von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I
2006-02-06
The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Merckel, Michael C; Huiskonen, Juha T; Bamford, Dennis H; Goldman, Adrian; Tuma, Roman
2005-04-15
Comparisons of bacteriophage PRD1 and adenovirus protein structures and virion architectures have been instrumental in unraveling an evolutionary relationship and have led to a proposal of a phylogeny-based virus classification. The structure of the PRD1 spike protein P5 provides further insight into the evolution of viral proteins. The crystallized P5 fragment comprises two structural domains: a globular knob and a fibrous shaft. The head folds into a ten-stranded jelly roll beta barrel, which is structurally related to the tumor necrosis factor (TNF) and the PRD1 coat protein domains. The shaft domain is a structural counterpart to the adenovirus spike shaft. The structural relationships between PRD1, TNF, and adenovirus proteins suggest that the vertex proteins may have originated from an ancestral TNF-like jelly roll coat protein via a combination of gene duplication and deletion.
Implementation of a parallel protein structure alignment service on cloud.
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Implementation of a Parallel Protein Structure Alignment Service on Cloud
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842
Protein Structure and Function Prediction Using I-TASSER
Yang, Jianyi; Zhang, Yang
2016-01-01
I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-01-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
SFG analysis of surface bound proteins: a route towards structure determination.
Weidner, Tobias; Castner, David G
2013-08-14
The surface of a material is rapidly covered with proteins once that material is placed in a biological environment. The structure and function of these bound proteins play a key role in the interactions and communications of the material with the biological environment. Thus, it is crucial to gain a molecular level understanding of surface bound protein structure. While X-ray diffraction and solution phase NMR methods are well established for determining the structure of proteins in the crystalline or solution phase, there is not a corresponding single technique that can provide the same level of structural detail about proteins at surfaces or interfaces. However, recent advances in sum frequency generation (SFG) vibrational spectroscopy have significantly increased our ability to obtain structural information about surface bound proteins and peptides. A multi-technique approach of combining SFG with (1) protein engineering methods to selectively introduce mutations and isotopic labels, (2) other experimental methods such as time-of-flight secondary ion mass spectrometry (ToF-SIMS) and near edge X-ray absorption fine structure (NEXAFS) to provide complementary information, and (3) molecular dynamic (MD) simulations to extend the molecular level experimental results is a particularly promising route for structural characterization of surface bound proteins and peptides. By using model peptides and small proteins with well-defined structures, methods have been developed to determine the orientation of both backbone and side chains to the surface.
SFG analysis of surface bound proteins: A route towards structure determination
Weidner, Tobias; Castner, David G.
2013-01-01
The surface of a material is rapidly covered with proteins once that material is placed in a biological environment. The structure and function of these bound proteins play a key role in the interactions and communications of the material with the biological environment. Thus, it is crucial to gain a molecular level understanding of surface bound protein structure. While X-ray diffraction and solution phase NMR methods are well established for determining the structure of proteins in the crystalline or solution phase, there is not a corresponding single technique that can provide the same level of structural detail about proteins at surfaces or interfaces. However, recent advances in sum frequency generation (SFG) vibrational spectroscopy have significantly increased our ability to obtain structural information about surface bound proteins and peptides. A multi-technique approach of combining SFG with (1) protein engineering methods to selectively introduce mutations and isotopic labels, (2) other experimental methods such as time-of-flight secondary ion mass spectrometry (ToF-SIMS) and near edge x-ray absorption fine structure (NEXAFS) to provide complementary information, and (3) molecular dynamic (MD) simulations to extend the molecular level experimental results is a particularly promising route for structural characterization of surface bound proteins and peptides. By using model peptides and small proteins with well-defined structures, methods have been developed to determine the orientation of both backbone and side chains to the surface. PMID:23727992
Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei
2016-01-01
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851
Optimization of protein-protein docking for predicting Fc-protein interactions.
Agostino, Mark; Mancera, Ricardo L; Ramsland, Paul A; Fernández-Recio, Juan
2016-11-01
The antibody crystallizable fragment (Fc) is recognized by effector proteins as part of the immune system. Pathogens produce proteins that bind Fc in order to subvert or evade the immune response. The structural characterization of the determinants of Fc-protein association is essential to improve our understanding of the immune system at the molecular level and to develop new therapeutic agents. Furthermore, Fc-binding peptides and proteins are frequently used to purify therapeutic antibodies. Although several structures of Fc-protein complexes are available, numerous others have not yet been determined. Protein-protein docking could be used to investigate Fc-protein complexes; however, improved approaches are necessary to efficiently model such cases. In this study, a docking-based structural bioinformatics approach is developed for predicting the structures of Fc-protein complexes. Based on the available set of X-ray structures of Fc-protein complexes, three regions of the Fc, loosely corresponding to three turns within the structure, were defined as containing the essential features for protein recognition and used as restraints to filter the initial docking search. Rescoring the filtered poses with an optimal scoring strategy provided a success rate of approximately 80% of the test cases examined within the top ranked 20 poses, compared to approximately 20% by the initial unrestrained docking. The developed docking protocol provides a significant improvement over the initial unrestrained docking and will be valuable for predicting the structures of currently undetermined Fc-protein complexes, as well as in the design of peptides and proteins that target Fc. Copyright © 2016 John Wiley & Sons, Ltd.
Wei, Yang; Thyparambil, Aby A.; Latour, Robert A.
2013-01-01
While protein-surface interactions have been widely studied, relatively little is understood at this time regarding how protein-surface interaction effects are influenced by protein-protein interactions and how these effects combine with the internal stability of a protein to influence its adsorbed-state structure and bioactivity. The objectives of this study were to develop a method to study these combined effects under widely varying protein-protein interaction conditions using hen egg-white lysozyme (HEWL) adsorbed on silica glass, poly(methyl methacrylate), and polyethylene as our model systems. In order to vary protein-protein interaction effects over a wide range, HEWL was first adsorbed to each surface type under widely varying protein solution concentrations for 2 h to saturate the surface, followed by immersion in pure buffer solution for 15 h to equilibrate the adsorbed protein layers in the absence of additionally adsorbing protein. Periodic measurements were made at selected time points of the areal density of the adsorbed protein layer as an indicator of the level of protein-protein interaction effects within the layer, and these values were then correlated with measurements of the adsorbed protein’s secondary structure and bioactivity. The results from these studies indicate that protein-protein interaction effects help stabilize the structure of HEWL adsorbed on silica glass, have little influence on the structural behavior of HEWL on HDPE, and actually serve to destabilize HEWL’s structure on PMMA. The bioactivity of HEWL on silica glass and HDPE was found to decrease in direct proportion to the degree of adsorption-induce protein unfolding. A direct correlation between bioactivity and the conformational state of adsorbed HEWL was less apparent on PMMA, thus suggesting that other factors influenced HEWL’s bioactivity on this surface, such as the accessibility of HEWL’s bioactive site being blocked by neighboring proteins or the surface itself. The developed methods provide an effective means to characterize the influence of protein-protein interaction effects and provide new molecular-level insights into how protein-protein interaction effects combine with protein-surface interaction and internal protein stability effects to influence the structure and bioactivity of adsorbed protein. PMID:23751416
Rapid search for tertiary fragments reveals protein sequence–structure relationships
Zhou, Jianfu; Grigoryan, Gevorg
2015-01-01
Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. PMID:25420575
Lazim, Raudah; Mei, Ye; Zhang, Dawei
2012-03-01
Replica exchange molecular dynamics (REMD) simulation provides an efficient conformational sampling tool for the study of protein folding. In this study, we explore the mechanism directing the structure variation from α/4β-fold protein to 3α-fold protein after mutation by conducting REMD simulation on 42 replicas with temperatures ranging from 270 K to 710 K. The simulation began from a protein possessing the primary structure of GA88 but the tertiary structure of GB88, two G proteins with "high sequence identity." Albeit the large Cα-root mean square deviation (RMSD) of the folded protein (4.34 Å at 270 K and 4.75 Å at 304 K), a variation in tertiary structure was observed. Together with the analysis of secondary structure assignment, cluster analysis and principal component, it provides insights to the folding and unfolding pathway of 3α-fold protein and α/4β-fold protein respectively paving the way toward the understanding of the ongoings during conformational variation.
CARd-3D: Carbon Distribution in 3D Structure Program for Globular Proteins
Ekambaram, Rajasekaran; Kannaiyan, Akila; Marimuthu, Vijayasarathy; Swaminathan, Vinobha Chinnaiah; Renganathan, Senthil; Perumal, Ananda Gopu
2014-01-01
Spatial arrangement of carbon in protein structure is analyzed here. Particularly, the carbon fractions around individual atoms are compared. It is hoped that it follows the principle of 31.45% carbon around individual atoms. The results reveal that globular protein's atoms follow this principle. A comparative study on monomer versus dimer reveal that carbon is better distributed in dimeric form than in its monomeric form. Similar study on solid versus liquid structures reveals that the liquid (NMR) structure has better carbon distribution over the corresponding solid (X-Ray) structure. The carbon fraction distributions in fiber and toxin protein are compared. Fiber proteins follow the principle of carbon fraction distribution. At the same time it has another broad spectrum of carbon distribution than in globular proteins. The toxin protein follows an abnormal carbon fraction distribution. The carbon fraction distribution plays an important role in deciding the structure and shape of proteins. It is hoped to help in understanding the protein folding and function. PMID:24748753
Wei, Qing; La, David; Kihara, Daisuke
2017-01-01
Prediction of protein-protein interaction sites in a protein structure provides important information for elucidating the mechanism of protein function and can also be useful in guiding a modeling or design procedures of protein complex structures. Since prediction methods essentially assess the propensity of amino acids that are likely to be part of a protein docking interface, they can help in designing protein-protein interactions. Here, we introduce BindML and BindML+ protein-protein interaction sites prediction methods. BindML predicts protein-protein interaction sites by identifying mutation patterns found in known protein-protein complexes using phylogenetic substitution models. BindML+ is an extension of BindML for distinguishing permanent and transient types of protein-protein interaction sites. We developed an interactive web-server that provides a convenient interface to assist in structural visualization of protein-protein interactions site predictions. The input data for the web-server are a tertiary structure of interest. BindML and BindML+ are available at http://kiharalab.org/bindml/ and http://kiharalab.org/bindml/plus/ .
Wong, Sienna; Jin, J-P
2017-01-01
Study of folded structure of proteins provides insights into their biological functions, conformational dynamics and molecular evolution. Current methods of elucidating folded structure of proteins are laborious, low-throughput, and constrained by various limitations. Arising from these methods is the need for a sensitive, quantitative, rapid and high-throughput method not only analysing the folded structure of proteins, but also to monitor dynamic changes under physiological or experimental conditions. In this focused review, we outline the foundation and limitations of current protein structure-determination methods prior to discussing the advantages of an emerging antibody epitope analysis for applications in structural, conformational and evolutionary studies of proteins. We discuss the application of this method using representative examples in monitoring allosteric conformation of regulatory proteins and the determination of the evolutionary lineage of related proteins and protein isoforms. The versatility of the method described herein is validated by the ability to modulate a variety of assay parameters to meet the needs of the user in order to monitor protein conformation. Furthermore, the assay has been used to clarify the lineage of troponin isoforms beyond what has been depicted by sequence homology alone, demonstrating the nonlinear evolutionary relationship between primary structure and tertiary structure of proteins. The antibody epitope analysis method is a highly adaptable technique of protein conformation elucidation, which can be easily applied without the need for specialized equipment or technical expertise. When applied in a systematic and strategic manner, this method has the potential to reveal novel and biomedically meaningful information for structure-function relationship and evolutionary lineage of proteins. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E
2017-02-03
SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Predicting nucleic acid binding interfaces from structural models of proteins
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2011-01-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767
Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.
Lella, Muralikrishna; Mahalakshmi, Radhakrishnan
2017-06-20
Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.
Huang, Wenxi; Liu, Wanting; Jin, Jingjie; Xiao, Qilan; Lu, Ruibin; Chen, Wei; Xiong, Sheng; Zhang, Gong
2018-03-25
Translational pausing coordinates protein synthesis and co-translational folding. It is a common factor that facilitates the correct folding of large, multi-domain proteins. For small proteins, pausing sites rarely occurs in the gene body, and the 3'-end pausing sites are only essential for the folding of a fraction of proteins. The determinant of the necessity of the pausings remains obscure. In this study, we demonstrated that the steady-state structural fluctuation is a predictor of the necessity of pausing-mediated co-translational folding for small proteins. Validated by experiments with 5 model proteins, we found that the rigid protein structures do not, while the flexible structures do need 3'-end pausings to fold correctly. Therefore, rational optimization of translational pausing can improve soluble expression of small proteins with flexible structures, but not the rigid ones. The rigidity of the structure can be quantitatively estimated in silico using molecular dynamic simulation. Nevertheless, we also found that the translational pausing optimization increases the fitness of the expression host, and thus benefits the recombinant protein production, independent from the soluble expression. These results shed light on the structural basis of the translational pausing and provided a practical tool for industrial protein fermentation. Copyright © 2017. Published by Elsevier Inc.
Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER.
Han, Xusi; Wei, Qing; Kihara, Daisuke
2017-12-08
With the rapid growth in the number of solved protein structures stored in the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB), it is essential to develop tools to perform real-time structure similarity searches against the entire structure database. Since conventional structure alignment methods need to sample different orientations of proteins in the three-dimensional space, they are time consuming and unsuitable for rapid, real-time database searches. To this end, we have developed 3D-SURFER and EM-SURFER, which utilize 3D Zernike descriptors (3DZD) to conduct high-throughput protein structure comparison, visualization, and analysis. Taking an atomic structure or an electron microscopy map of a protein or a protein complex as input, the 3DZD of a query protein is computed and compared with the 3DZD of all other proteins in PDB or EMDB. In addition, local geometrical characteristics of a query protein can be analyzed using VisGrid and LIGSITE CSC in 3D-SURFER. This article describes how to use 3D-SURFER and EM-SURFER to carry out protein surface shape similarity searches, local geometric feature analysis, and interpretation of the search results. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Introduction to Protein Structure through Genetic Diseases
ERIC Educational Resources Information Center
Schneider, Tanya L.; Linton, Brian R.
2008-01-01
An illuminating way to learn about protein function is to explore high-resolution protein structures. Analysis of the proteins involved in genetic diseases has been used to introduce students to protein structure and the role that individual mutations can play in the onset of disease. Known mutations can be correlated to changes in protein…
Theodoridou, Katerina; Yu, Peiqiang
2013-06-12
Protein quality relies not only on total protein but also on protein inherent structures. The most commonly occurring protein secondary structures (α-helix and β-sheet) may influence protein quality, nutrient utilization, and digestive behavior. The objectives of this study were to reveal the protein molecular structures of canola meal (yellow and brown) and presscake as affected by the heat-processing methods and to investigate the relationship between structure changes and protein rumen degradations kinetics, estimated protein intestinal digestibility, degraded protein balance, and metabolizable protein. Heat-processing conditions resulted in a higher value for α-helix and β-sheet for brown canola presscake compared to brown canola meal. The multivariate molecular spectral analyses (PCA, CLA) showed that there were significant molecular structural differences in the protein amide I and II fingerprint region (ca. 1700-1480 cm(-1)) between the brown canola meal and presscake. The in situ degradation parameters, amide I and II, and α-helix to β-sheet ratio (R_a_β) were positively correlated with the degradable fraction and the degradation rate. Modeling results showed that α-helix was positively correlated with the truly absorbed rumen synthesized microbial protein in the small intestine when using both the Dutch DVE/OEB system and the NRC-2001 model. Concerning the protein profiles, R_a_β was a better predictor for crude protein (79%) and for neutral detergent insoluble crude protein (68%). In conclusion, ATR-FT/IR molecular spectroscopy may be used to rapidly characterize feed structures at the molecular level and also as a potential predictor of feed functionality, digestive behavior, and nutrient utilization of canola feed.
Structure of the Newcastle disease virus F protein in the post-fusion conformation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swanson, Kurt; Wen, Xiaolin; Leser, George P.
2010-11-17
The paramyxovirus F protein is a class I viral membrane fusion protein which undergoes a significant refolding transition during virus entry. Previous studies of the Newcastle disease virus, human parainfluenza virus 3 and parainfluenza virus 5 F proteins revealed differences in the pre- and post-fusion structures. The NDV Queensland (Q) F structure lacked structural elements observed in the other two structures, which are key to the refolding and fusogenic activity of F. Here we present the NDV Australia-Victoria (AV) F protein post-fusion structure and provide EM evidence for its folding to a pre-fusion form. The NDV AV F structure containsmore » heptad repeat elements missing in the previous NDV Q F structure, forming a post-fusion six-helix bundle (6HB) similar to the post-fusion hPIV3 F structure. Electrostatic and temperature factor analysis of the F structures points to regions of these proteins that may be functionally important in their membrane fusion activity.« less
Computational design of chimeric protein libraries for directed evolution.
Silberg, Jonathan J; Nguyen, Peter Q; Stevenson, Taylor
2010-01-01
The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly related proteins are recombined, the fraction of variants having a disrupted structure increases. One way to enrich the fraction of folded and potentially interesting chimeras in these libraries is to use computational algorithms to anticipate which structural elements can be swapped without disturbing the integrity of a protein's structure. Herein, we describe how the algorithm Schema uses the sequences and structures of the parent proteins recombined to predict the structural disruption of chimeras, and we outline how dynamic programming can be used to find libraries with a range of amino acid substitution levels that are enriched in variants with low Schema disruption.
SSEP: secondary structural elements of proteins
Shanthi, V.; Selvarani, P.; Kiran Kumar, Ch.; Mohire, C. S.; Sekar, K.
2003-01-01
SSEP is a comprehensive resource for accessing information related to the secondary structural elements present in the 25 and 90% non-redundant protein chains. The database contains 1771 protein chains from 1670 protein structures and 6182 protein chains from 5425 protein structures in 25 and 90% non-redundant protein chains, respectively. The current version provides information about the α-helical segments and β-strand fragments of varying lengths. In addition, it also contains the information about 310-helix, β- and ν-turns and hairpin loops. The free graphics program RASMOL has been interfaced with the search engine to visualize the three-dimensional structures of the user queried secondary structural fragment. The database is updated regularly and is available through Bioinformatics web server at http://cluster.physics.iisc.ernet.in/ssep/ or http://144.16.71.148/ssep/. PMID:12824336
2016-01-01
Biologically active but floppy proteins represent a new reality of modern protein science. These intrinsically disordered proteins (IDPs) and hybrid proteins containing ordered and intrinsically disordered protein regions (IDPRs) constitute a noticeable part of any given proteome. Functionally, they complement ordered proteins, and their conformational flexibility and structural plasticity allow them to perform impossible tricks and be engaged in biological activities that are inaccessible to well folded proteins with their unique structures. The major goals of this minireview are to show that, despite their simplified amino acid sequences, IDPs/IDPRs are complex entities often resembling chaotic systems, are structurally and functionally heterogeneous, and can be considered an important part of the structure-function continuum. Furthermore, IDPs/IDPRs are everywhere, and are ubiquitously engaged in various interactions characterized by a wide spectrum of binding scenarios and an even wider spectrum of structural and functional outputs. PMID:26851286
Ashford, Paul; Moss, David S; Alex, Alexander; Yeap, Siew K; Povia, Alice; Nobeli, Irene; Williams, Mark A
2012-03-14
Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active and regulatory sites; ii) a simulated ensemble of unliganded Bcl2 structures reveals extensions of a known ligand-binding pocket not apparent in the apo crystal structure; iii) visualisations of interleukin-2 and its homologues highlight conserved pockets at the known receptor interfaces and regions whose conformation is known to change on inhibitor binding. Through post-processing of the output of a variety of pocket prediction software, Provar provides a flexible approach to the analysis and visualization of the persistence or variability of pockets in sets of related protein structures.
Ayuso-Tejedor, Sara; Angarica, Vladimir Espinosa; Bueno, Marta; Campos, Luis A; Abián, Olga; Bernadó, Pau; Sancho, Javier; Jiménez, M Angeles
2010-07-23
Partly unfolded protein conformations close to the native state may play important roles in protein function and in protein misfolding. Structural analyses of such conformations which are essential for their fully physicochemical understanding are complicated by their characteristic low populations at equilibrium. We stabilize here with a single mutation the equilibrium intermediate of apoflavodoxin thermal unfolding and determine its solution structure by NMR. It consists of a large native region identical with that observed in the X-ray structure of the wild-type protein plus an unfolded region. Small-angle X-ray scattering analysis indicates that the calculated ensemble of structures is consistent with the actual degree of expansion of the intermediate. The unfolded region encompasses discontinuous sequence segments that cluster in the 3D structure of the native protein forming the FMN cofactor binding loops and the binding site of a variety of partner proteins. Analysis of the apoflavodoxin inner interfaces reveals that those becoming destabilized in the intermediate are more polar than other inner interfaces of the protein. Natively folded proteins contain hydrophobic cores formed by the packing of hydrophobic surfaces, while natively unfolded proteins are rich in polar residues. The structure of the apoflavodoxin thermal intermediate suggests that the regions of natively folded proteins that are easily responsive to thermal activation may contain cores of intermediate hydrophobicity. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Feyzi, Samira; Varidi, Mehdi; Zare, Fatemeh; Varidi, Mohammad Javad
2018-03-01
Different drying methods due to protein denaturation could alter the functional properties of proteins, as well as their structure. So, this study focused on the effect of different drying methods on amino acid content, thermo and functional properties, and protein structure of fenugreek protein isolate. Freeze and spray drying methods resulted in comparable protein solubility, dynamic surface and interfacial tensions, foaming and emulsifying properties except for emulsion stability. Vacuum oven drying promoted emulsion stability, surface hydrophobicity and viscosity of fenugreek protein isolate at the expanse of its protein solubility. Vacuum oven process caused a higher level of Maillard reaction followed by the spray drying process, which was confirmed by the lower amount of lysine content and less lightness, also more browning intensity. ΔH of fenugreek protein isolates was higher than soy protein isolate, which confirmed the presence of more ordered structures. Also, the bands which are attributed to the α-helix structures in the FTIR spectrum were in the shorter wave number region for freeze and spray dried fenugreek protein isolates that show more possibility of such structures. This research suggests that any drying method must be conducted in its gentle state in order to sustain native structure of proteins and promote their functionalities. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Kemege, Kyle E.; Hickey, John M.; Lovell, Scott; Battaile, Kevin P.; Zhang, Yang; Hefty, P. Scott
2011-01-01
Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF) CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-Å Cα root mean square deviation [RMSD]) the high-resolution (1.8-Å) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur. PMID:21965559
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kemege, Kyle E.; Hickey, John M.; Lovell, Scott
2012-02-13
Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF)more » CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-{angstrom} C{alpha} root mean square deviation [RMSD]) the high-resolution (1.8-{angstrom}) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur.« less
Mixture models for protein structure ensembles.
Hirsch, Michael; Habeck, Michael
2008-10-01
Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.
Meyer, Austin G; Wilke, Claus O
2015-10-06
Protein structure acts as a general constraint on the evolution of viral proteins. One widely recognized structural constraint explaining evolutionary variation among sites is the relative solvent accessibility (RSA) of residues in the folded protein. In influenza virus, the distance from functional sites has been found to explain an additional portion of the evolutionary variation in the external antigenic proteins. However, to what extent RSA and distance from a reference site in the protein can be used more generally to explain protein adaptation in other viruses and in the different proteins of any given virus remains an open question. To address this question, we have carried out an analysis of the distribution and structural predictors of site-wise dN/dS in HIV-1. Our results indicate that the distribution of dN/dS in HIV follows a smooth gamma distribution, with no special enrichment or depletion of sites with dN/dS at or above one. The variation in dN/dS can be partially explained by RSA and distance from a reference site in the protein, but these structural constraints do not act uniformly among the different HIV-1 proteins. Structural constraints are highly predictive in just one of the three enzymes and one of three structural proteins in HIV-1. For these two proteins, the protease enzyme and the gp120 structural protein, structure explains between 30 and 40% of the variation in dN/dS. Finally, for the gp120 protein of the receptor-binding complex, we also find that glycosylation sites explain just 2% of the variation in dN/dS and do not explain gp120 evolution independently of either RSA or distance from the apical surface. © 2015 The Author(s).
HBNG: Graph theory based visualization of hydrogen bond networks in protein structures.
Tiwari, Abhishek; Tiwari, Vivek
2007-07-09
HBNG is a graph theory based tool for visualization of hydrogen bond network in 2D. Digraphs generated by HBNG facilitate visualization of cooperativity and anticooperativity chains and rings in protein structures. HBNG takes hydrogen bonds list files (output from HBAT, HBEXPLORE, HBPLUS and STRIDE) as input and generates a DOT language script and constructs digraphs using freeware AT and T Graphviz tool. HBNG is useful in the enumeration of favorable topologies of hydrogen bond networks in protein structures and determining the effect of cooperativity and anticooperativity on protein stability and folding. HBNG can be applied to protein structure comparison and in the identification of secondary structural regions in protein structures. Program is available from the authors for non-commercial purposes.
Gorai, Biswajit; Prabhavadhni, Arasu; Sivaraman, Thirunavukkarasu
2015-09-01
Unfolding stabilities of two homologous proteins, cardiotoxin III and short-neurotoxin (SNTX) belonging to three-finger toxin (TFT) superfamily, have been probed by means of molecular dynamics (MD) simulations. Combined analysis of data obtained from steered MD and all-atom MD simulations at various temperatures in near physiological conditions on the proteins suggested that overall structural stabilities of the two proteins were different from each other and the MD results are consistent with experimental data of the proteins reported in the literature. Rationalization for the differential structural stabilities of the structurally similar proteins has been chiefly attributed to the differences in the structural contacts between C- and N-termini regions in their three-dimensional structures, and the findings endorse the 'CN network' hypothesis proposed to qualitatively analyse the thermodynamic stabilities of proteins belonging to TFT superfamily of snake venoms. Moreover, the 'CN network' hypothesis has been revisited and the present study suggested that 'CN network' should be accounted in terms of 'structural contacts' and 'structural strengths' in order to precisely describe order of structural stabilities of TFTs.
G-protein-coupled receptor structures were not built in a day.
Blois, Tracy M; Bowie, James U
2009-07-01
Among the most exciting recent developments in structural biology is the structure determination of G-protein-coupled receptors (GPCRs), which comprise the largest class of membrane proteins in mammalian cells and have enormous importance for disease and drug development. The GPCR structures are perhaps the most visible examples of a nascent revolution in membrane protein structure determination. Like other major milestones in science, however, such as the sequencing of the human genome, these achievements were built on a hidden foundation of technological developments. Here, we describe some of the methods that are fueling the membrane protein structure revolution and have enabled the determination of the current GPCR structures, along with new techniques that may lead to future structures.
Protein crystallization X-ray diffraction data collection Protein structure determination Obtaining structures of protein-ligand complexes Site-directed mutagenesis Structure-function relationship Enzymatic CelA," Science (2013) "Sequence, Structure, and Evolution of Cellulases in Glycoside
A Review on Structures and Functions of Bcl-2 Family Proteins from Homo sapiens.
Sivakumar, Dakshinamurthy; Sivaraman, Thirunavukkarasu
2016-01-01
Cancer cells evade apoptosis, which is regulated by proteins of Bcl-2 family in the intrinsic pathways. Numerous experimental three-dimensional (3D) structures of the apoptotic proteins and the proteins bound with small chemical molecules/peptides/proteins have been reported in the literature. In this review article, the 3D structures of the Bcl-2 family proteins from Homo sapiens and as well complex structures of the anti-apoptotic proteins bound with small molecular inhibitors reported in the literature to date have been comprehensively listed out and described in detail. Moreover, the molecular mechanisms by which the Bcl-2 family proteins modulate the apoptotic processes and strategies for designing antagonists to anti-apoptotic proteins have been concisely discussed.
Structures of membrane proteins
Vinothkumar, Kutti R.; Henderson, Richard
2010-01-01
In reviewing the structures of membrane proteins determined up to the end of 2009, we present in words and pictures the most informative examples from each family. We group the structures together according to their function and architecture to provide an overview of the major principles and variations on the most common themes. The first structures, determined 20 years ago, were those of naturally abundant proteins with limited conformational variability, and each membrane protein structure determined was a major landmark. With the advent of complete genome sequences and efficient expression systems, there has been an explosion in the rate of membrane protein structure determination, with many classes represented. New structures are published every month and more than 150 unique membrane protein structures have been determined. This review analyses the reasons for this success, discusses the challenges that still lie ahead, and presents a concise summary of the key achievements with illustrated examples selected from each class. PMID:20667175
Structural Genomics of Protein Phosphatases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almo,S.; Bonanno, J.; Sauder, J.
The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptionalmore » regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.« less
Robic, Srebrenka
2010-01-01
To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative phenomena in undergraduate classes. In the process of learning about these topics, students often form incorrect ideas. For example, by learning about protein folding in the context of protein synthesis, students may come to an incorrect conclusion that once synthesized on the ribosome, a protein spends its entire cellular life time in its fully folded native confirmation. This is clearly not true; proteins are dynamic structures that undergo both local fluctuations and global unfolding events. To prevent and address such misconceptions, basic concepts of protein science can be introduced in the context of simple mathematical models and hands-on explorations of publicly available data sets. Ten common misconceptions about proteins are presented, along with suggestions for using equations, models, sequence, structure, and thermodynamic data to help students gain a deeper understanding of basic concepts relating to protein structure, folding, and stability.
Knowledge-based model building of proteins: concepts and examples.
Bajorath, J.; Stenkamp, R.; Aruffo, A.
1993-01-01
We describe how to build protein models from structural templates. Methods to identify structural similarities between proteins in cases of significant, moderate to low, or virtually absent sequence similarity are discussed. The detection and evaluation of structural relationships is emphasized as a central aspect of protein modeling, distinct from the more technical aspects of model building. Computational techniques to generate and complement comparative protein models are also reviewed. Two examples, P-selectin and gp39, are presented to illustrate the derivation of protein model structures and their use in experimental studies. PMID:7505680
Overcoming barriers to membrane protein structure determination.
Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst
2011-04-01
After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.
Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel
2016-01-01
Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility of Geomfinder, which was able to discriminate between similar and different 3D patterns related to binding sites of common substrates in a range of diverse proteins. Geomfinder allows detecting similar 3D patterns between any two pair of protein structures, regardless of the divergency among their amino acids sequences. Although the software is not intended for simultaneous multiple comparisons in a large number of proteins, it can be particularly useful in cases such as the structure-based design of multitarget drugs, where a detailed analysis of 3D patterns similarities between a few selected protein targets is essential.
NASA Astrophysics Data System (ADS)
Cheng, Kwan; Cheng, Sara
We used molecular dynamics simulations to examine the effects of transbilayer distribution of lipid molecules, particularly anionic lipids with negatively charged headgroups, on the structure and binding kinetics of an amyloidogenic protein on the membrane surface and subsequent protein-induced structural disruption of the membrane. Our systems consisted of a model beta-sheet rich dimeric protein absorbed on asymmetric bilayers with neutral and anionic lipids and symmetric bilayers with neutral lipids. We observed larger folding, domain aggregation, and tilt angle of the absorbed protein on the asymmetric bilayer surfaces. We also detected more focused bilayer thinning in the asymmetric bilayer due to weak lipid-protein interactions. Our results support the mechanism that the higher lipid packing in the protein-contacting lipid leaflet promotes stronger protein-protein but weaker protein-lipid interactions of an amyloidogenic protein on the membrane surface. We speculate that the observed surface-induced structural and protein-lipid interaction of our model amyloidogenic protein may play a role in the early membrane-associated amyloid cascade pathway that leads to membrane structural damage of neurons in Alzheimer's disease. NSF ACI-1531594.
A novel structural tree for wrap-proteins, a subclass of (α+β)-proteins.
Boshkova, Eugenia A; Gordeev, Alexey B; Efimov, Alexander V
2014-01-01
In this paper, a novel structural subclass of (α+β)-proteins is presented. A characteristic feature of these proteins and domains is that they consist of strongly twisted and coiled β-sheets wrapped around one or two α-helices, so they are referred to here as wrap-proteins. It is shown that overall folds of the wrap-proteins can be obtained by stepwise addition of α-helices and/or β-strands to the strongly twisted and coiled β-hairpin taken as the starting structure in modeling. As a result of modeling, a structural tree for the wrap-proteins was constructed that includes 201 folds of which 49 occur in known nonhomologous proteins.
Uchikoga, Nobuyuki; Hirokawa, Takatsugu
2010-05-11
Protein-protein docking for proteins with large conformational changes was analyzed by using interaction fingerprints, one of the scales for measuring similarities among complex structures, utilized especially for searching near-native protein-ligand or protein-protein complex structures. Here, we have proposed a combined method for analyzing protein-protein docking by taking large conformational changes into consideration. This combined method consists of ensemble soft docking with multiple protein structures, refinement of complexes, and cluster analysis using interaction fingerprints and energy profiles. To test for the applicability of this combined method, various CaM-ligand complexes were reconstructed from the NMR structures of unbound CaM. For the purpose of reconstruction, we used three known CaM-ligands, namely, the CaM-binding peptides of cyclic nucleotide gateway (CNG), CaM kinase kinase (CaMKK) and the plasma membrane Ca2+ ATPase pump (PMCA), and thirty-one structurally diverse CaM conformations. For each ligand, 62000 CaM-ligand complexes were generated in the docking step and the relationship between their energy profiles and structural similarities to the native complex were analyzed using interaction fingerprint and RMSD. Near-native clusters were obtained in the case of CNG and CaMKK. The interaction fingerprint method discriminated near-native structures better than the RMSD method in cluster analysis. We showed that a combined method that includes the interaction fingerprint is very useful for protein-protein docking analysis of certain cases.
A hidden markov model derived structural alphabet for proteins.
Camproux, A C; Gautier, R; Tufféry, P
2004-06-04
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
Protein sectors: evolutionary units of three-dimensional structure
Halabi, Najeeb; Rivoire, Olivier; Leibler, Stanislas; Ranganathan, Rama
2011-01-01
Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term “protein sectors”. Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories. PMID:19703402
Weininger, Arthur; Weininger, Susan
2015-01-01
The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells. PMID:25706124
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-02-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.
Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy
Micsonai, András; Wien, Frank; Kernya, Linda; Lee, Young-Ho; Goto, Yuji; Réfrégiers, Matthieu; Kardos, József
2015-01-01
Circular dichroism (CD) spectroscopy is a widely used technique for the study of protein structure. Numerous algorithms have been developed for the estimation of the secondary structure composition from the CD spectra. These methods often fail to provide acceptable results on α/β-mixed or β-structure–rich proteins. The problem arises from the spectral diversity of β-structures, which has hitherto been considered as an intrinsic limitation of the technique. The predictions are less reliable for proteins of unusual β-structures such as membrane proteins, protein aggregates, and amyloid fibrils. Here, we show that the parallel/antiparallel orientation and the twisting of the β-sheets account for the observed spectral diversity. We have developed a method called β-structure selection (BeStSel) for the secondary structure estimation that takes into account the twist of β-structures. This method can reliably distinguish parallel and antiparallel β-sheets and accurately estimates the secondary structure for a broad range of proteins. Moreover, the secondary structure components applied by the method are characteristic to the protein fold, and thus the fold can be predicted to the level of topology in the CATH classification from a single CD spectrum. By constructing a web server, we offer a general tool for a quick and reliable structure analysis using conventional CD or synchrotron radiation CD (SRCD) spectroscopy for the protein science research community. The method is especially useful when X-ray or NMR techniques fail. Using BeStSel on data collected by SRCD spectroscopy, we investigated the structure of amyloid fibrils of various disease-related proteins and peptides. PMID:26038575
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
DOE Office of Scientific and Technical Information (OSTI.GOV)
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
Amyloidogenesis of Natively Unfolded Proteins
Uversky, Vladimir N.
2009-01-01
Aggregation and subsequent development of protein deposition diseases originate from conformational changes in corresponding amyloidogenic proteins. The accumulated data support the model where protein fibrillogenesis proceeds via the formation of a relatively unfolded amyloidogenic conformation, which shares many structural properties with the pre-molten globule state, a partially folded intermediate first found during the equilibrium and kinetic (un)folding studies of several globular proteins and later described as one of the structural forms of natively unfolded proteins. The flexibility of this structural form is essential for the conformational rearrangements driving the formation of the core cross-beta structure of the amyloid fibril. Obviously, molecular mechanisms describing amyloidogenesis of ordered and natively unfolded proteins are different. For ordered protein to fibrillate, its unique and rigid structure has to be destabilized and partially unfolded. On the other hand, fibrillogenesis of a natively unfolded protein involves the formation of partially folded conformation; i.e., partial folding rather than unfolding. In this review recent findings are surveyed to illustrate some unique features of the natively unfolded proteins amyloidogenesis. PMID:18537543
Mechanisms of Formation and Structure of Chromophores of Fluorescent Proteins from Anthoza Species
2005-03-01
green fluorescent protein (GFP) from the jellyfish Aequorea victoria ...investigate chemical structures and mechanisms of formation of chromophores within proteins of Green Fluorescent Protein (GFP) family. The project... structure . In this part of work we have shown that a fluorescent protein from Dendronephthya sp. transforms from the green - to the red-emitting
Synchrotron IR microspectroscopy for protein structure analysis: Potential and questions
Yu, Peiqiang
2006-01-01
Synchrotron radiation-based Fourier transform infrared microspectroscopy (S-FTIR) has been developed as a rapid, direct, non-destructive, bioanalytical technique. This technique takes advantage of synchrotron light brightness and small effective source size and is capable of exploring the molecular chemical make-up within microstructures of a biological tissue without destruction of inherent structures at ultra-spatial resolutions within cellular dimension. To date there has been very little application of this advanced technique to the study of pure protein inherent structure at a cellular level in biological tissues. In this review, a novel approach was introduced to show the potential of the newly developed, advancedmore » synchrotron-based analytical technology, which can be used to localize relatively “pure“ protein in the plant tissues and relatively reveal protein inherent structure and protein molecular chemical make-up within intact tissue at cellular and subcellular levels. Several complex protein IR spectra data analytical techniques (Gaussian and Lorentzian multi-component peak modeling, univariate and multivariate analysis, principal component analysis (PCA), and hierarchical cluster analysis (CLA) are employed to relatively reveal features of protein inherent structure and distinguish protein inherent structure differences between varieties/species and treatments in plant tissues. By using a multi-peak modeling procedure, RELATIVE estimates (but not EXACT determinations) for protein secondary structure analysis can be made for comparison purpose. The issues of pro- and anti-multi-peaking modeling/fitting procedure for relative estimation of protein structure were discussed. By using the PCA and CLA analyses, the plant molecular structure can be qualitatively separate one group from another, statistically, even though the spectral assignments are not known. The synchrotron-based technology provides a new approach for protein structure research in biological tissues at ultraspatial resolutions.« less
Kryshtafovych, Andriy; Moult, John; Bartual, Sergio G.; Bazan, J. Fernando; Berman, Helen; Casteel, Darren E.; Christodoulou, Evangelos; Everett, John K.; Hausmann, Jens; Heidebrecht, Tatjana; Hills, Tanya; Hui, Raymond; Hunt, John F.; Jayaraman, Seetharaman; Joachimiak, Andrzej; Kennedy, Michael A.; Kim, Choel; Lingel, Andreas; Michalska, Karolina; Montelione, Gaetano T.; Otero, José M.; Perrakis, Anastassis; Pizarro, Juan C.; van Raaij, Mark J.; Ramelot, Theresa A.; Rousseau, Francois; Tong, Liang; Wernimont, Amy K.; Young, Jasmine; Schwede, Torsten
2011-01-01
One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae. PMID:22020785
Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method
Park, In-Hee; Gangupomu, Vamshi; Wagner, Jeffrey; Jain, Abhinandan; Vaidehi, Nagara-jan
2012-01-01
The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement. PMID:22260550
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osipiuk, J.; Gornicki, P.; Maj, L.
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 Angstroms. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 Angstroms from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer {alpha}/{beta} sandwich with the overall shape of a cylinder and shows no structural homology to proteins of knownmore » structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the {alpha}-{beta} plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.« less
Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold.
Osipiuk, J; Górnicki, P; Maj, L; Dementieva, I; Laskowski, R; Joachimiak, A
2001-11-01
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 A. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 A from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder and shows no structural homology to proteins of known structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the alpha-beta plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.
Impact of genetic variation on three dimensional structure and function of proteins
Bhattacharya, Roshni; Rose, Peter W.; Burley, Stephen K.
2017-01-01
The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 126,000 experimentally determined atomic level three-dimensional (3D) structures of biological macromolecules (proteins, DNA, RNA), all of which are freely accessible via the Internet. Knowledge of the 3D structure of the gene product can help in understanding its function and role in disease. Of particular interest in the PDB archive are proteins for which 3D structures of genetic variant proteins have been determined, thus revealing atomic-level structural differences caused by the variation at the DNA level. Herein, we present a systematic and qualitative analysis of such cases. We observe a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding, and dissociation, some in the context of large assemblies. Structural comparison of wild type and mutated proteins, when both are available, provide insights into atomic-level structural differences caused by the genetic variation. PMID:28296894
Structural alphabets derived from attractors in conformational space
2010-01-01
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics. PMID:20170534
Classification of protein quaternary structure by functional domain composition
Yu, Xiaojing; Wang, Chuan; Li, Yixue
2006-01-01
Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572
Xiaodan, Chen; Xiurong, Zhan; Xinyu, Wu; Chunyan, Zhao; Wanghong, Zhao
2015-04-01
The aim of this study is to analyze the three-dimensional crystal structure of SMU.2055 protein, a putative acetyltransferase from the major caries pathogen Streptococcus mutans (S. mutans). The design and selection of the structure-based small molecule inhibitors are also studied. The three-dimensional crystal structure of SMU.2055 protein was obtained by structural genomics research methods of gene cloning and expression, protein purification with Ni²⁺-chelating affinity chromatography, crystal screening, and X-ray diffraction data collection. An inhibitor virtual model matching with its target protein structure was set up using computer-aided drug design methods, virtual screening and fine docking, and Libdock and Autodock procedures. The crystal of SMU.2055 protein was obtained, and its three-dimensional crystal structure was analyzed. This crystal was diffracted to a resolution of 0.23 nm. It belongs to orthorhombic space group C222(1), with unit cell parameters of a = 9.20 nm, b = 9.46 nm, and c = 19.39 nm. The asymmetric unit contained four molecules, with a solvent content of 56.7%. Moreover, five small molecule compounds, whose structure matched with that of the target protein in high degree, were designed and selected. Protein crystallography research of S. mutans SMU.2055 helps to understand the structures and functions of proteins from S. mutans at the atomic level. These five compounds may be considered as effective inhibitors to SMU.2055. The virtual model of small molecule inhibitors we built will lay a foundation to the anticaries research based on the crystal structure of proteins.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
Quality assessment of protein model-structures based on structural and functional similarities
2012-01-01
Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498
Xu, Xianzhong; Pulavarti, Surya V S R K; Eletsky, Alexander; Huang, Yuanpeng Janet; Acton, Thomas B; Xiao, Rong; Everett, John K; Montelione, Gaetano T; Szyperski, Thomas
2014-12-01
High-quality solution NMR structures of three homeodomains from human proteins ALX4, ZHX1 and CASP8AP2 were solved. These domains were chosen as targets of a biomedical theme project pursued by the Northeast Structural Genomics Consortium. This project focuses on increasing the structural coverage of human proteins associated with cancer.
Protein domain definition should allow for conditional disorder
Yegambaram, Kavestri; Bulloch, Esther MM; Kingston, Richard L
2013-01-01
Abstract: Proteins are often classified in a binary fashion as either structured or disordered. However this approach has several deficits. Firstly, protein folding is always conditional on the physiochemical environment. A protein which is structured in some circumstances will be disordered in others. Secondly, it hides a fundamental asymmetry in behavior. While all structured proteins can be unfolded through a change in environment, not all disordered proteins have the capacity for folding. Failure to accommodate these complexities confuses the definition of both protein structural domains and intrinsically disordered regions. We illustrate these points with an experimental study of a family of small binding domains, drawn from the RNA polymerase of mumps virus and its closest relatives. Assessed at face value the domains fall on a structural continuum, with folded, partially folded, and near unstructured members. Yet the disorder present in the family is conditional, and these closely related polypeptides can access the same folded state under appropriate conditions. Any heuristic definition of the protein domain emphasizing conformational stability divides this domain family in two, in a way that makes no biological sense. Structural domains would be better defined by their ability to adopt a specific tertiary structure: a structure that may or may not be realized, dependent on the circumstances. This explicitly allows for the conditional nature of protein folding, and more clearly demarcates structural domains from intrinsically disordered regions that may function without folding. PMID:23963781
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2017-04-01
Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .
NASA Astrophysics Data System (ADS)
Yu, Peiqiang; Jonker, Arjan; Gruber, Margaret
2009-09-01
To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of β-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this study was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included α-helices, β-sheets and other structures such as β-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower ( P < 0.05) percentage of the model-fitted α-helices (29 vs. 34) and model-fitted β-sheets (22 vs. 27) and a higher ( P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference ( P > 0.05) in the ratio of α-helices to β-sheets (average: 1.4) and higher ( P < 0.05) ratios of α-helices to others (0.7 vs. 0.9) and β-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference ( P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, P.; Jonker, A; Gruber, M
2009-01-01
To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of e-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this studymore » was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included a-helices, e-sheets and other structures such as e-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower (P < 0.05) percentage of the model-fitted a-helices (29 vs. 34) and model-fitted e-sheets (22 vs. 27) and a higher (P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference (P > 0.05) in the ratio of a-helices to e-sheets (average: 1.4) and higher (P < 0.05) ratios of a-helices to others (0.7 vs. 0.9) and e-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference (P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.« less
Zhu, Tong; Zhang, John Z H; He, Xiao
2014-09-14
In this work, protein side chain (1)H chemical shifts are used as probes to detect and correct side-chain packing errors in protein's NMR structures through structural refinement. By applying the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method for ab initio calculation of chemical shifts, incorrect side chain packing was detected in the NMR structures of the Pin1 WW domain. The NMR structure is then refined by using molecular dynamics simulation and the polarized protein-specific charge (PPC) model. The computationally refined structure of the Pin1 WW domain is in excellent agreement with the corresponding X-ray structure. In particular, the use of the PPC model yields a more accurate structure than that using the standard (nonpolarizable) force field. For comparison, some of the widely used empirical models for chemical shift calculations are unable to correctly describe the relationship between the particular proton chemical shift and protein structures. The AF-QM/MM method can be used as a powerful tool for protein NMR structure validation and structural flaw detection.
Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan
2014-01-01
Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID:24748752
Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan
2014-01-01
The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates.
Lappala, Anna; Nishima, Wataru; Miner, Jacob; Fenimore, Paul; Fischer, Will; Hraber, Peter; Zhang, Ming; McMahon, Benjamin; Tung, Chang-Shung
2018-05-10
Membrane fusion proteins are responsible for viral entry into host cells—a crucial first step in viral infection. These proteins undergo large conformational changes from pre-fusion to fusion-initiation structures, and, despite differences in viral genomes and disease etiology, many fusion proteins are arranged as trimers. Structural information for both pre-fusion and fusion-initiation states is critical for understanding virus neutralization by the host immune system. In the case of Ebola virus glycoprotein (EBOV GP) and Zika virus envelope protein (ZIKV E), pre-fusion state structures have been identified experimentally, but only partial structures of fusion-initiation states have been described. While the fusion-initiation structure is in an energetically unfavorable state that is difficult to solve experimentally, the existing structural information combined with computational approaches enabled the modeling of fusion-initiation state structures of both proteins. These structural models provide an improved understanding of four different neutralizing antibodies in the prevention of viral host entry.
The Differential Response of Proteins to Macromolecular Crowding
Candotti, Michela; Orozco, Modesto
2016-01-01
The habitat in which proteins exert their function contains up to 400 g/L of macromolecules, most of which are proteins. The repercussions of this dense environment on protein behavior are often overlooked or addressed using synthetic agents such as poly(ethylene glycol), whose ability to mimic protein crowders has not been demonstrated. Here we performed a comprehensive atomistic molecular dynamic analysis of the effect of protein crowders on the structure and dynamics of three proteins, namely an intrinsically disordered protein (ACTR), a molten globule conformation (NCBD), and a one-fold structure (IRF-3) protein. We found that crowding does not stabilize the native compact structure, and, in fact, often prevents structural collapse. Poly(ethylene glycol) PEG500 failed to reproduce many aspects of the physiologically-relevant protein crowders, thus indicating its unsuitability to mimic the cell interior. Instead, the impact of protein crowding on the structure and dynamics of a protein depends on its degree of disorder and results from two competing effects: the excluded volume, which favors compact states, and quinary interactions, which favor extended conformers. Such a viscous environment slows down protein flexibility and restricts the conformational landscape, often biasing it towards bioactive conformations but hindering biologically relevant protein-protein contacts. Overall, the protein crowders used here act as unspecific chaperons that modulate the protein conformational space, thus having relevant consequences for disordered proteins. PMID:27471851
Structural perturbation of proteins in low denaturant concentrations.
Basak, S; Debnath, D; Haque, E; Ray, S; Chakrabarti, A
2001-01-01
The presence of very low concentrations of the widely used chemical denaturants, guanidinium chloride and urea, induce changes in the tertiary structure of proteins. We have presented results on such changes in four structurally unrelated proteins to show that such structural perturbations are common irrespective of their origin. Data representative of such structural changes are shown for the monomeric globular proteins such as horseradish peroxidase (HRP) from a plant, human serum albumin (HSA) and prothrombin from ovine blood serum, and for the membrane-associated, worm-like elongated protein, spectrin, from ovine erythrocytes. Structural alterations in these proteins were reflected in quenching studies of tryptophan fluorescence using the widely used quencher acrylamide. Stern-Volmer quenching constants measured in presence of the denaturants, even at concentrations below 100 mM, were higher than those measured in absence of the denaturants. Both steady-state and time-resolved fluorescence emission properties of tryptophan and of the extrinsic probe PRODAN were used for monitoring conformational changes in the proteins in presence of different low concentrations of the denaturants. These results are consistent with earlier studies from our laboratory indicating structural perturbations in proteins at the tertiary level, keeping their native-like secondary structure and their biological activity more or less intact.
ERIC Educational Resources Information Center
Asmus, Elaine Garbarino
2007-01-01
Individual students model specific amino acids and then, through dehydration synthesis, a class of students models a protein. The students clearly learn amino acid structure, primary, secondary, tertiary, and quaternary structure in proteins and the nature of the bonds maintaining a protein's shape. This activity is fun, concrete, inexpensive and…
The use of polyoxometalates in protein crystallography – An attempt to widen a well-known bottleneck
Bijelic, Aleksandar; Rompel, Annette
2015-01-01
Polyoxometalates (POMs) are discrete polynuclear metal-oxo anions with a fascinating variety of structures and unique chemical and physical properties. Their application in various fields is well covered in the literature, however little information about their usage in protein crystallization is available. This review summarizes the impact of the vast class of POMs on the formation of protein crystals, a well-known (frustrating) bottleneck in macromolecular crystallography, with the associated structure elucidation and a particular emphasis focused on POM's potential as a powerful crystallization additive for future research. The Protein Data Bank (PDB) was scanned for protein structures with incorporated POMs which were assigned a PDB ligand ID resulting in 30 PDB entries. These structures have been analyzed with regard to (i) the structure of POM itself in the immediate protein environment, (ii) the kind of interaction and position of the POM within the protein structure and (iii) the beneficial effects of POM on protein crystallography apparent so far. PMID:26339074
Yu, Peiqiang; Doiron, Kevin; Liu, Dasen
2008-05-14
The objective of this study was to use advanced synchrotron-sourced FTIR microspectroscopy (SFTIRM) as a novel approach to identify the differences in protein and carbohydrate molecular structure (chemical makeup) between these two varieties of barley and illustrate the exact causes for their significantly different degradation kinetics. Items assessed included (1) molecular structural differences in protein amide I to amide II intensities and their ratio within cellular dimensions, (2) molecular structural differences in protein secondary structure profile and their ratios, and (3) molecular structural differences in carbohydrate component peak profile. Our hypothesis was that molecular structure (chemical makeup) affects barley quality, fermentation, and degradation behavior in both humans and animals. Using SFTIRM, the protein and carbohydrate molecular structural chemical makeup of barley was revealed and identified. The protein molecular structural chemical makeup differed significantly between the two varieties of barleys. No difference in carbohydrate molecular structural chemical makeup was detected. Harrington was lower than Valier in protein amide I, amide II, and protein amide I to amide II ratio, while Harrington was relatively higher in model-fitted protein alpha-helix and beta-sheet, but lower in the others (beta-turn and random coil). These results indicated that it is the molecular structure of protein (chemical makeup) that may play a major role in the different degradation kinetics between the two varieties of barleys (not the molecular structure of carbohydrate). It is believed that use of the advanced synchrotron technology will make a significant step and an important contribution to research in examining the molecular structure (chemical makeup) of plant, feed, and seeds.
Fang, Jing; Nevin, Philip; Kairys, Visvaldas; Venclovas, Česlovas; Engen, John R; Beuning, Penny J
2014-04-08
The relationship between protein sequence, structure, and dynamics has been elusive. Here, we report a comprehensive analysis using an in-solution experimental approach to study how the conservation of tertiary structure correlates with protein dynamics. Hydrogen exchange measurements of eight processivity clamp proteins from different species revealed that, despite highly similar three-dimensional structures, clamp proteins display a wide range of dynamic behavior. Differences were apparent both for structurally similar domains within proteins and for corresponding domains of different proteins. Several of the clamps contained regions that underwent local unfolding with different half-lives. We also observed a conserved pattern of alternating dynamics of the α helices lining the inner pore of the clamps as well as a correlation between dynamics and the number of salt bridges in these α helices. Our observations reveal that tertiary structure and dynamics are not directly correlated and that primary structure plays an important role in dynamics. Copyright © 2014 Elsevier Ltd. All rights reserved.
Brown, Simon H J; Mitchell, Todd W; Oakley, Aaron J; Pham, Huong T; Blanksby, Stephen J
2012-09-01
Since the 1950s, X-ray crystallography has been the mainstay of structural biology, providing detailed atomic-level structures that continue to revolutionize our understanding of protein function. From recent advances in this discipline, a picture has emerged of intimate and specific interactions between lipids and proteins that has driven renewed interest in the structure of lipids themselves and raised intriguing questions as to the specificity and stoichiometry in lipid-protein complexes. Herein we demonstrate some of the limitations of crystallography in resolving critical structural features of ligated lipids and thus determining how these motifs impact protein binding. As a consequence, mass spectrometry must play an important and complementary role in unraveling the complexities of lipid-protein interactions. We evaluate recent advances and highlight ongoing challenges towards the twin goals of (1) complete structure elucidation of low, abundant, and structurally diverse lipids by mass spectrometry alone, and (2) assignment of stoichiometry and specificity of lipid interactions within protein complexes.
NASA Astrophysics Data System (ADS)
Brown, Simon H. J.; Mitchell, Todd W.; Oakley, Aaron J.; Pham, Huong T.; Blanksby, Stephen J.
2012-09-01
Since the 1950s, X-ray crystallography has been the mainstay of structural biology, providing detailed atomic-level structures that continue to revolutionize our understanding of protein function. From recent advances in this discipline, a picture has emerged of intimate and specific interactions between lipids and proteins that has driven renewed interest in the structure of lipids themselves and raised intriguing questions as to the specificity and stoichiometry in lipid-protein complexes. Herein we demonstrate some of the limitations of crystallography in resolving critical structural features of ligated lipids and thus determining how these motifs impact protein binding. As a consequence, mass spectrometry must play an important and complementary role in unraveling the complexities of lipid-protein interactions. We evaluate recent advances and highlight ongoing challenges towards the twin goals of (1) complete structure elucidation of low, abundant, and structurally diverse lipids by mass spectrometry alone, and (2) assignment of stoichiometry and specificity of lipid interactions within protein complexes.
Predicting nucleic acid binding interfaces from structural models of proteins.
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2012-02-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.
Fast computational methods for predicting protein structure from primary amino acid sequence
Agarwal, Pratul Kumar [Knoxville, TN
2011-07-19
The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.
Structural hot spots for the solubility of globular proteins
Ganesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, Joost
2016-01-01
Natural selection shapes protein solubility to physiological requirements and recombinant applications that require higher protein concentrations are often problematic. This raises the question whether the solubility of natural protein sequences can be improved. We here show an anti-correlation between the number of aggregation prone regions (APRs) in a protein sequence and its solubility, suggesting that mutational suppression of APRs provides a simple strategy to increase protein solubility. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein solubility are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein solubility independently of structure and function. PMID:26905391
The synthesis of recombinant membrane proteins in yeast for structural studies.
Routledge, Sarah J; Mikaliunaite, Lina; Patel, Anjana; Clare, Michelle; Cartwright, Stephanie P; Bawa, Zharain; Wilks, Martin D B; Low, Floren; Hardy, David; Rothnie, Alice J; Bill, Roslyn M
2016-02-15
Historically, recombinant membrane protein production has been a major challenge meaning that many fewer membrane protein structures have been published than those of soluble proteins. However, there has been a recent, almost exponential increase in the number of membrane protein structures being deposited in the Protein Data Bank. This suggests that empirical methods are now available that can ensure the required protein supply for these difficult targets. This review focuses on methods that are available for protein production in yeast, which is an important source of recombinant eukaryotic membrane proteins. We provide an overview of approaches to optimize the expression plasmid, host cell and culture conditions, as well as the extraction and purification of functional protein for crystallization trials in preparation for structural studies. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Brodie, Nicholas I; Popov, Konstantin I; Petrotchenko, Evgeniy V; Dokholyan, Nikolay V; Borchers, Christoph H
2017-07-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein-models for α helix-rich and β sheet-rich proteins, respectively-and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures.
Advances in structural and functional analysis of membrane proteins by electron crystallography
Wisedchaisri, Goragot; Reichow, Steve L.; Gonen, Tamir
2011-01-01
Summary Electron crystallography is a powerful technique for the study of membrane protein structure and function in the lipid environment. When well-ordered two-dimensional crystals are obtained the structure of both protein and lipid can be determined and lipid-protein interactions analyzed. Protons and ionic charges can be visualized by electron crystallography and the protein of interest can be captured for structural analysis in a variety of physiologically distinct states. This review highlights the strengths of electron crystallography and the momentum that is building up in automation and the development of high throughput tools and methods for structural and functional analysis of membrane proteins by electron crystallography. PMID:22000511
Advances in structural and functional analysis of membrane proteins by electron crystallography.
Wisedchaisri, Goragot; Reichow, Steve L; Gonen, Tamir
2011-10-12
Electron crystallography is a powerful technique for the study of membrane protein structure and function in the lipid environment. When well-ordered two-dimensional crystals are obtained the structure of both protein and lipid can be determined and lipid-protein interactions analyzed. Protons and ionic charges can be visualized by electron crystallography and the protein of interest can be captured for structural analysis in a variety of physiologically distinct states. This review highlights the strengths of electron crystallography and the momentum that is building up in automation and the development of high throughput tools and methods for structural and functional analysis of membrane proteins by electron crystallography. Copyright © 2011 Elsevier Ltd. All rights reserved.
Treatment of Second-Order Structures of Proteins Using Oxygen Radio Frequency Plasma
NASA Astrophysics Data System (ADS)
Hayashi, Nobuya; Nakahigashi, Akari; Liu, Hao; Goto, Masaaki
2010-08-01
Decomposition characteristics of second-order structures of proteins are determined using an oxygen radio frequency (RF) plasma sterilizer in order to prevent infectious proteins from contaminating medical equipment in hospitals. The removal of casein protein as a test protein with a concentration of 50 mg/cm2 on the plane substrate requires approximately 8 h when singlet atomic oxygen is irradiated. The peak intensity of Fourier transform infrared spectroscopy (FTIR) spectra of the β-sheet structures decreases at approximately the same rate as those of the α-helix and first-order structures of proteins. Active oxygen has a sufficient oxidation energy to dissociate hydrogen bonds within the β-sheet structure.
CHENG, JIANLIN; EICKHOLT, JESSE; WANG, ZHENG; DENG, XIN
2013-01-01
After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can signicantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction. PMID:22809379
NASA Astrophysics Data System (ADS)
Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio
2012-12-01
We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.
Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio
2012-12-07
We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.
González-Díaz, Humberto; Munteanu, Cristian R; Postelnicu, Lucian; Prado-Prado, Francisco; Gestal, Marcos; Pazos, Alejandro
2012-03-01
Lipid-Binding Proteins (LIBPs) or Fatty Acid-Binding Proteins (FABPs) play an important role in many diseases such as different types of cancer, kidney injury, atherosclerosis, diabetes, intestinal ischemia and parasitic infections. Thus, the computational methods that can predict LIBPs based on 3D structure parameters became a goal of major importance for drug-target discovery, vaccine design and biomarker selection. In addition, the Protein Data Bank (PDB) contains 3000+ protein 3D structures with unknown function. This list, as well as new experimental outcomes in proteomics research, is a very interesting source to discover relevant proteins, including LIBPs. However, to the best of our knowledge, there are no general models to predict new LIBPs based on 3D structures. We developed new Quantitative Structure-Activity Relationship (QSAR) models based on 3D electrostatic parameters of 1801 different proteins, including 801 LIBPs. We calculated these electrostatic parameters with the MARCH-INSIDE software and they correspond to the entire protein or to specific protein regions named core, inner, middle, and surface. We used these parameters as inputs to develop a simple Linear Discriminant Analysis (LDA) classifier to discriminate 3D structure of LIBPs from other proteins. We implemented this predictor in the web server named LIBP-Pred, freely available at , along with other important web servers of the Bio-AIMS portal. The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server. We demonstrated the PDB mining option performing a predictive study of 2000+ proteins with unknown function. Interesting results regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense.
A Circular Dichroism Reference Database for Membrane Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace,B.; Wien, F.; Stone, T.
2006-01-01
Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Functional classification of protein structures by local structure matching in graph representation.
Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo
2018-03-31
As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B; Peterlik, Herwig; Jungbauer, Alois; Tscheliessnig, Rupert
2010-11-07
Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on the basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.
NASA Astrophysics Data System (ADS)
Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B.; Peterlik, Herwig; Jungbauer, Alois; Tscheliessnig, Rupert
2010-11-01
Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on the basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.
Future directions of electron crystallography.
Fujiyoshi, Yoshinori
2013-01-01
In biological science, there are still many interesting and fundamental yet difficult questions, such as those in neuroscience, remaining to be answered. Structural and functional studies of membrane proteins, which are key molecules of signal transduction in neural and other cells, are essential for understanding the molecular mechanisms of many fundamental biological processes. Technological and instrumental advancements of electron microscopy have facilitated comprehension of structural studies of biological components, such as membrane proteins. While X-ray crystallography has been the main method of structure analysis of proteins including membrane proteins, electron crystallography is now an established technique to analyze structures of membrane proteins in the lipid bilayer, which is close to their natural biological environment. By utilizing cryo-electron microscopes with helium-cooled specimen stages, structures of membrane proteins were analyzed at a resolution better than 3 Å. Such high-resolution structural analysis of membrane proteins by electron crystallography opens up the new research field of structural physiology. Considering the fact that the structures of integral membrane proteins in their native membrane environment without artifacts from crystal contacts are critical in understanding their physiological functions, electron crystallography will continue to be an important technology for structural analysis. In this chapter, I will present several examples to highlight important advantages and to suggest future directions of this technique.
BAYESIAN PROTEIN STRUCTURE ALIGNMENT.
Rodriguez, Abel; Schmidler, Scott C
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B.
2010-11-07
Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on themore » basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.« less
Modelling and enhanced molecular dynamics to steer structure-based drug discovery.
Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe
2014-05-01
The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Design of structurally distinct proteins using strategies inspired by evolution
Jacobs, T. M.; Williams, B.; Williams, T.; ...
2016-05-06
Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. In this paper, we describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. Finally, thismore » method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.« less
Structural anatomy of telomere OB proteins.
Horvath, Martin P
2011-10-01
Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA.
Structural anatomy of telomere OB proteins
Horvath, Martin P.
2015-01-01
Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA. PMID:21950380
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.
2015-01-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
Protein docking by the interface structure similarity: how much structure is needed?
Sinha, Rohita; Kundrotas, Petras J; Vakser, Ilya A
2012-01-01
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.
T-RMSD: a web server for automated fine-grained protein structural classification.
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-07-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.
T-RMSD: a web server for automated fine-grained protein structural classification
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-01-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
VoroMQA: Assessment of protein structure quality using interatomic contact areas.
Olechnovič, Kliment; Venclovas, Česlovas
2017-06-01
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Designing and benchmarking the MULTICOM protein structure prediction system
2013-01-01
Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819
NASA Astrophysics Data System (ADS)
Ferrer, Evelina G.; Gómez, Analía V.; Añón, María C.; Puppo, María C.
2011-06-01
Food protein product, gluten protein, was chemically modified by varying levels of sodium stearoyl lactylate (SSL); and the extent of modifications (secondary and tertiary structures) of this protein was analyzed by using Raman spectroscopy. Analysis of the Amide I band showed an increase in its intensity mainly after the addition of the 0.25% of SSL to wheat flour to produced modified gluten protein, pointing the formation of a more ordered structure. Side chain vibrations also confirmed the observed changes.
Modeling repetitive, non‐globular proteins
Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun
2016-01-01
Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323
PDBsum: Structural summaries of PDB entries.
Laskowski, Roman A; Jabłońska, Jagoda; Pravda, Lukáš; Vařeková, Radka Svobodová; Thornton, Janet M
2018-01-01
PDBsum is a web server providing structural information on the entries in the Protein Data Bank (PDB). The analyses are primarily image-based and include protein secondary structure, protein-ligand and protein-DNA interactions, PROCHECK analyses of structural quality, and many others. The 3D structures can be viewed interactively in RasMol, PyMOL, and a JavaScript viewer called 3Dmol.js. Users can upload their own PDB files and obtain a set of password-protected PDBsum analyses for each. The server is freely accessible to all at: http://www.ebi.ac.uk/pdbsum. © 2017 The Protein Society.
Topological characteristics of helical repeat proteins.
Groves, M R; Barford, D
1999-06-01
The recent elucidation of protein structures based upon repeating amino acid motifs, including the armadillo motif, the HEAT motif and tetratricopeptide repeats, reveals that they belong to the class of helical repeat proteins. These proteins share the common property of being assembled from tandem repeats of an alpha-helical structural unit, creating extended superhelical structures that are ideally suited to create a protein recognition interface.
Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes
Joseph, Agnel Praveen; Valadié, Hélène; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.
2012-01-01
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions. PMID:22745680
Mao, Xiaoying; Hua, Yufei
2012-01-01
In this study, composition, structure and the functional properties of protein concentrate (WPC) and protein isolate (WPI) produced from defatted walnut flour (DFWF) were investigated. The results showed that the composition and structure of walnut protein concentrate (WPC) and walnut protein isolate (WPI) were significantly different. The molecular weight distribution of WPI was uniform and the protein composition of DFWF and WPC was complex with the protein aggregation. H(0) of WPC was significantly higher (p < 0.05) than those of DFWF and WPI, whilst WPI had a higher H(0) compared to DFWF. The secondary structure of WPI was similar to WPC. WPI showed big flaky plate like structures; whereas WPC appeared as a small flaky and more compact structure. The most functional properties of WPI were better than WPC. In comparing most functional properties of WPI and WPC with soybean protein concentrate and isolate, WPI and WPC showed higher fat absorption capacity (FAC). Emulsifying properties and foam properties of WPC and WPI in alkaline pH were comparable with that of soybean protein concentrate and isolate. Walnut protein concentrates and isolates can be considered as potential functional food ingredients.
Uversky, Vladimir N
2015-03-01
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A
2008-08-01
The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.
PBxplore: a tool to analyze local protein structure and deformability with Protein Blocks
Craveur, Pierrick; Joseph, Agnel Praveen; Jallu, Vincent
2017-01-01
This paper describes the development and application of a suite of tools, called PBxplore, to analyze the dynamics and deformability of protein structures using Protein Blocks (PBs). Proteins are highly dynamic macromolecules, and a classical way to analyze their inherent flexibility is to perform molecular dynamics simulations. The advantage of using small structural prototypes such as PBs is to give a good approximation of the local structure of the protein backbone. More importantly, by reducing the conformational complexity of protein structures, PBs allow analysis of local protein deformability which cannot be done with other methods and had been used efficiently in different applications. PBxplore is able to process large amounts of data such as those produced by molecular dynamics simulations. It produces frequencies, entropy and information logo outputs as text and graphics. PBxplore is available at https://github.com/pierrepo/PBxplore and is released under the open-source MIT license. PMID:29177113
Principles of assembly reveal a periodic table of protein complexes.
Ahnert, Sebastian E; Marsh, Joseph A; Hernández, Helena; Robinson, Carol V; Teichmann, Sarah A
2015-12-11
Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering. Copyright © 2015, American Association for the Advancement of Science.
The use of experimental structures to model protein dynamics.
Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L
2015-01-01
The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.
The Use of Experimental Structures to Model Protein Dynamics
Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.
2014-01-01
Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965
HARMONY: a server for the assessment of protein structures
Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.
2006-01-01
Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999
Deformation and Failure of Protein Materials in Physiologically Extreme Conditions and Disease
2009-03-01
resonance (NMR) spectroscopy and X- ray crystallography have advanced our ability to identify 3D protein structures57. Site-specific studies using NMR, a... ray crystallography, providing structural and temporal information about mechanisms of deformation and assembly (for example in intermediate...tens of thousands of 3D atomistic protein structures, identifying the structure of numerous proteins from varying species sources60. X- ray
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doiron, K.; Yu, P; McKinnon, J
2009-01-01
The objectives of this study were to reveal protein structures of feed tissues affected by heat processing at a cellular level, using the synchrotron-based Fourier transform infrared microspectroscopy as a novel approach, and quantify protein structure in relation to protein digestive kinetics and nutritive value in the rumen and intestine in dairy cattle. The parameters assessed included (1) protein structure a-helix to e-sheet ratio; (2) protein subfractions profiles; (3) protein degradation kinetics and effective degradability; (4) predicted nutrient supply using the intestinally absorbed protein supply (DVE)/degraded protein balance (OEB) system for dairy cattle. In this study, Vimy flaxseed protein wasmore » used as a model feed protein and was autoclave-heated at 120C for 20, 40, and 60 min in treatments T1, T2, and T3, respectively. The results showed that using the synchrotron-based Fourier transform infrared microspectroscopy revealed and identified the heat-induced protein structure changes. Heating at 120C for 40 and 60 min increased the protein structure a-helix to e-sheet ratio. There were linear effects of heating time on the ratio. The heating also changed chemical profiles, which showed soluble CP decreased upon heating with concomitant increases in nonprotein nitrogen, neutral, and acid detergent insoluble nitrogen. The protein subfractions with the greatest changes were PB1, which showed a dramatic reduction, and PB2, which showed a dramatic increase, demonstrating a decrease in overall protein degradability. In situ results showed a reduction in rumen-degradable protein and in rumen-degradable dry matter without differences between the treatments. Intestinal digestibility, determined using a 3-step in vitro procedure, showed no changes to rumen undegradable protein. Modeling results showed that heating increased total intestinally absorbable protein (feed DVE value) and decreased degraded protein balance (feed OEB value), but there were no differences between the treatments. There was a linear effect of heating time on the DVE and a cubic effect on the OEB value. Our results showed that heating changed chemical profiles, protein structure a-helix to e-sheet ratio, and protein subfractions; decreased rumen-degradable protein and rumen-degradable dry matter; and increased potential nutrient supply to dairy cattle. The protein structure a-helix to e-sheet ratio had a significant positive correlation with total intestinally absorbed protein supply and negative correlation with degraded protein balance.« less
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander
2009-11-01
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Protein Engineering Approaches in the Post-Genomic Era.
Singh, Raushan K; Lee, Jung-Kul; Selvaraj, Chandrabose; Singh, Ranjitha; Li, Jinglin; Kim, Sang-Yong; Kalia, Vipin C
2018-01-01
Proteins are one of the most multifaceted macromolecules in living systems. Proteins have evolved to function under physiological conditions and, therefore, are not usually tolerant of harsh experimental and environmental conditions. The growing use of proteins in industrial processes as a greener alternative to chemical catalysts often demands constant innovation to improve their performance. Protein engineering aims to design new proteins or modify the sequence of a protein to create proteins with new or desirable functions. With the emergence of structural and functional genomics, protein engineering has been invigorated in the post-genomic era. The three-dimensional structures of proteins with known functions facilitate protein engineering approaches to design variants with desired properties. There are three major approaches of protein engineering research, namely, directed evolution, rational design, and de novo design. Rational design is an effective method of protein engineering when the threedimensional structure and mechanism of the protein is well known. In contrast, directed evolution does not require extensive information and a three-dimensional structure of the protein of interest. Instead, it involves random mutagenesis and selection to screen enzymes with desired properties. De novo design uses computational protein design algorithms to tailor synthetic proteins by using the three-dimensional structures of natural proteins and their folding rules. The present review highlights and summarizes recent protein engineering approaches, and their challenges and limitations in the post-genomic era. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Exploring the repeat protein universe through computational protein design
Brunette, TJ; Parmeggiani, Fabio; Huang, Po-Ssu; ...
2015-12-16
A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. In this paper, we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelatedmore » to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Finally, our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.« less
Statistical mechanics of protein structural transitions: Insights from the island model
Kobayashi, Yukio
2016-01-01
The so-called island model of protein structural transition holds that hydrophobic interactions are the key to both the folding and function of proteins. Herein, the genesis and statistical mechanical basis of the island model of transitions are reviewed, by presenting the results of simulations of such transitions. Elucidating the physicochemical mechanism of protein structural formation is the foundation for understanding the hierarchical structure of life at the microscopic level. Based on the results obtained to date using the island model, remaining problems and future work in the field of protein structures are discussed, referencing Professor Saitô’s views on the hierarchic structure of science. PMID:28409078
Johnson, Derrick E.; Xue, Bin; Sickmeier, Megan D.; Meng, Jingwei; Cortese, Marc S.; Oldfield, Christopher J.; Le Gall, Tanguy; Dunker, A. Keith; Uversky, Vladimir N.
2012-01-01
The identification of intrinsically disordered proteins (IDPs) among the targets that fail to form satisfactory crystal structures in the Protein Structure Initiative represent a key to reducing the costs and time for determining three-dimensional structures of proteins. To help in this endeavor, several Protein Structure Initiative Centers were asked to send samples of both crystallizable proteins and proteins that failed to crystallize. The abundance of intrinsic disorder in these proteins was evaluated via computational analysis using Predictors of Natural Disordered Regions (PONDR®) and the potential cleavage sites and corresponding fragments were determined. Then, the target proteins were analyzed for intrinsic disorder by their resistance to limited proteolysis. The rates of tryptic digestion of sample target proteins were compared to those of lysozyme/myoglobin, apo-myoglobin and α-casein as standards of ordered, partially disordered and completely disordered proteins, respectively. At the next stage, the protein samples were subjected to both far-UV and near-UV circular dichroism (CD) analysis. For most of the samples, a good agreement between CD data, predictions of disorder and the rates of limited tryptic digestion was established. Further experimentation is being performed on a smaller subset of these samples in order to obtain more detailed information on the ordered/disordered nature of the proteins. PMID:22651963
Uversky, Vladimir N
2016-03-25
Biologically active but floppy proteins represent a new reality of modern protein science. These intrinsically disordered proteins (IDPs) and hybrid proteins containing ordered and intrinsically disordered protein regions (IDPRs) constitute a noticeable part of any given proteome. Functionally, they complement ordered proteins, and their conformational flexibility and structural plasticity allow them to perform impossible tricks and be engaged in biological activities that are inaccessible to well folded proteins with their unique structures. The major goals of this minireview are to show that, despite their simplified amino acid sequences, IDPs/IDPRs are complex entities often resembling chaotic systems, are structurally and functionally heterogeneous, and can be considered an important part of the structure-function continuum. Furthermore, IDPs/IDPRs are everywhere, and are ubiquitously engaged in various interactions characterized by a wide spectrum of binding scenarios and an even wider spectrum of structural and functional outputs. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
CASTp 3.0: computed atlas of surface topography of proteins.
Tian, Wei; Chen, Chang; Lei, Xue; Zhao, Jieling; Liang, Jie
2018-06-01
Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.
Zou, Wei; Sissons, Mike; Warren, Frederick J; Gidley, Michael J; Gilbert, Robert G
2016-11-05
The roles that the compact structure and proteins in pasta play in retarding evolution of starch molecular structure during in vitro digestion are explored, using four types of cooked samples: whole pasta, pasta powder, semolina (with proteins) and extracted starch without proteins. These were subjected to in vitro digestion with porcine α-amylase, collecting samples at different times and characterizing the weight distribution of branched starch molecules using size-exclusion chromatography. Measurement of α-amylase activity showed that a protein (or proteins) from semolina or pasta powder interacted with α-amylase, causing reduced enzymatic activity and retarding digestion of branched starch molecules with hydrodynamic radius (Rh)<100nm; this protein(s) was susceptible to proteolysis. Thus the compact structure of pasta protects the starch and proteins in the interior of the whole pasta, reducing the enzymatic degradation of starch molecules, especially for molecules with Rh>100nm. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
X-ray scattering data and structural genomics
NASA Astrophysics Data System (ADS)
Doniach, Sebastian
2003-03-01
High throughput structural genomics has the ambitious goal of determining the structure of all, or a very large number of protein folds using the high-resolution techniques of protein crystallography and NMR. However, the program is facing significant bottlenecks in reaching this goal, which include problems of protein expression and crystallization. In this talk, some preliminary results on how the low-resolution technique of small-angle X-ray solution scattering (SAXS) can help ameliorate some of these bottlenecks will be presented. One of the most significant bottlenecks arises from the difficulty of crystallizing integral membrane proteins, where only a handful of structures are available compared to thousands of structures for soluble proteins. By 3-dimensional reconstruction from SAXS data, the size and shape of detergent-solubilized integral membrane proteins can be characterized. This information can then be used to classify membrane proteins which constitute some 25% of all genomes. SAXS may also be used to study the dependence of interparticle interference scattering on solvent conditions so that regions of the protein solution phase diagram which favor crystallization can be elucidated. As a further application, SAXS may be used to provide physical constraints on computational methods for protein structure prediction based on primary sequence information. This in turn can help in identifying structural homologs of a given protein, which can then give clues to its function. D. Walther, F. Cohen and S. Doniach. "Reconstruction of low resolution three-dimensional density maps from one-dimensional small angle x-ray scattering data for biomolecules." J. Appl. Cryst. 33(2):350-363 (2000). Protein structure prediction constrained by solution X-ray scattering data and structural homology identification Zheng WJ, Doniach S JOURNAL OF MOLECULAR BIOLOGY , v. 316(#1) pp. 173-187 FEB 8, 2002
PDBStat: a universal restraint converter and restraint analysis software package for protein NMR.
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M; Montelione, Gaetano T
2013-08-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M.; Montelione, Gaetano T
2013-01-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data. PMID:23897031
Deciphering the shape and deformation of secondary structures through local conformation analysis
2011-01-01
Background Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Results Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. Conclusion The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons. PMID:21284872
Deciphering the shape and deformation of secondary structures through local conformation analysis.
Baussand, Julie; Camproux, Anne-Claude
2011-02-01
Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang
2011-01-01
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036
Visualizing water molecules in transmembrane proteins using radiolytic labeling methods†
Orban, Tivadar; Gupta, Sayan; Palczewski, Krzysztof; Chance, Mark R.
2010-01-01
Essential to cells and their organelles, water is both shuttled to where it is needed and trapped within cellular compartments and structures. Moreover, ordered waters within protein structures often co-localize with strategically placed polar or charged groups critical for protein function. Yet it is unclear if these ordered water molecules provide structural stabilization, mediate conformational changes in signaling, neutralize charged residues, or carry out a combination of all these functions. Structures of many integral membrane proteins, including G protein-coupled receptors (GPCRs), reveal the presence of ordered water molecules that may act like prosthetic groups in a manner quite unlike bulk water. Identification of ‘ordered’ waters within a crystalline protein structure requires sufficient occupancy of water to enable its detection in the protein's X-ray diffraction pattern and thus the observed waters likely represent a subset of tightly-bound functional waters. In this review, we highlight recent studies that suggest the structures of ordered waters within GPCRs are as conserved (and thus as important) as conserved side chains. In addition, methods of radiolysis, coupled to structural mass spectrometry (protein footprinting), reveal dynamic changes in water structure that mediate transmembrane signaling. The idea of water as a prosthetic group mediating chemical reaction dynamics is not new in fields such as catalysis. However, the concept of water as a mediator of conformational dynamics in signaling is just emerging, owing to advances in both crystallographic structure determination and new methods of protein footprinting. Although oil and water do not mix, understanding the roles of water is essential to understanding the function of membrane proteins. PMID:20047303
3D Complex: A Structural Classification of Protein Complexes
Levy, Emmanuel D; Pereira-Leal, Jose B; Chothia, Cyrus; Teichmann, Sarah A
2006-01-01
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. PMID:17112313
ECOD: An Evolutionary Classification of Protein Domains
Kinch, Lisa N.; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V.
2014-01-01
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies. PMID:25474468
ECOD: an evolutionary classification of protein domains.
Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V
2014-12-01
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
Tracing Primordial Protein Evolution through Structurally Guided Stepwise Segment Elongation*
Watanabe, Hideki; Yamasaki, Kazuhiko; Honda, Shinya
2014-01-01
The understanding of how primordial proteins emerged has been a fundamental and longstanding issue in biology and biochemistry. For a better understanding of primordial protein evolution, we synthesized an artificial protein on the basis of an evolutionary hypothesis, segment-based elongation starting from an autonomously foldable short peptide. A 10-residue protein, chignolin, the smallest foldable polypeptide ever reported, was used as a structural support to facilitate higher structural organization and gain-of-function in the development of an artificial protein. Repetitive cycles of segment elongation and subsequent phage display selection successfully produced a 25-residue protein, termed AF.2A1, with nanomolar affinity against the Fc region of immunoglobulin G. AF.2A1 shows exquisite molecular recognition ability such that it can distinguish conformational differences of the same molecule. The structure determined by NMR measurements demonstrated that AF.2A1 forms a globular protein-like conformation with the chignolin-derived β-hairpin and a tryptophan-mediated hydrophobic core. Using sequence analysis and a mutation study, we discovered that the structural organization and gain-of-function emerged from the vicinity of the chignolin segment, revealing that the structural support served as the core in both structural and functional development. Here, we propose an evolutionary model for primordial proteins in which a foldable segment serves as the evolving core to facilitate structural and functional evolution. This study provides insights into primordial protein evolution and also presents a novel methodology for designing small sized proteins useful for industrial and pharmaceutical applications. PMID:24356963
Xiao, Rong; Anderson, Stephen; Aramini, James; Belote, Rachel; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John K.; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Jiang, Mei; Kornhaber, Gregory J.; Lee, Dong Yup; Locke, Jessica Y.; Ma, Li-Chung; Maglaqui, Melissa; Mao, Lei; Mitra, Saheli; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Sharma, Seema; Shastry, Ritu; Swapna, G.V.T.; Tong, Saichu N.; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.; Acton, Thomas B.
2014-01-01
We describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples. The platform is centered on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems. The 6X-His tag allows for similar purification procedures for most targets and implementation of high-throughput (HTP) parallel methods. In most cases, the 6X-His-tagged proteins are sufficiently purified (> 97% homogeneity) using a HTP two-step purification protocol for most structural studies. Using this platform, the open reading frames of over 16,000 different targeted proteins (or domains) have been cloned as > 26,000 constructs. Over the past nine years, more than 16,000 of these expressed protein, and more than 4,400 proteins (or domains) have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html). Using these samples, the NESG has deposited more than 900 new protein structures to the Protein Data Bank (PDB). The methods described here are effective in producing eukaryotic and prokaryotic protein samples in E. coli. This paper summarizes some of the updates made to the protein production pipeline in the last five years, corresponding to phase 2 of the NIGMS Protein Structure Initiative (PSI-2) project. The NESG Protein Production Platform is suitable for implementation in a large individual laboratory or by a small group of collaborating investigators. These advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are of broad value to the structural biology, functional proteomics, and structural genomics communities. PMID:20688167
Rapid and reliable protein structure determination via chemical shift threading.
Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S
2018-01-01
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Binding free energy analysis of protein-protein docking model structures by evERdock.
Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio
2018-03-14
To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.
Binding free energy analysis of protein-protein docking model structures by evERdock
NASA Astrophysics Data System (ADS)
Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio
2018-03-01
To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.
PDB2Graph: A toolbox for identifying critical amino acids map in proteins based on graph theory.
Niknam, Niloofar; Khakzad, Hamed; Arab, Seyed Shahriar; Naderi-Manesh, Hossein
2016-05-01
The integrative and cooperative nature of protein structure involves the assessment of topological and global features of constituent parts. Network concept takes complete advantage of both of these properties in the analysis concomitantly. High compatibility to structural concepts or physicochemical properties in addition to exploiting a remarkable simplification in the system has made network an ideal tool to explore biological systems. There are numerous examples in which different protein structural and functional characteristics have been clarified by the network approach. Here, we present an interactive and user-friendly Matlab-based toolbox, PDB2Graph, devoted to protein structure network construction, visualization, and analysis. Moreover, PDB2Graph is an appropriate tool for identifying critical nodes involved in protein structural robustness and function based on centrality indices. It maps critical amino acids in protein networks and can greatly aid structural biologists in selecting proper amino acid candidates for manipulating protein structures in a more reasonable and rational manner. To introduce the capability and efficiency of PDB2Graph in detail, the structural modification of Calmodulin through allosteric binding of Ca(2+) is considered. In addition, a mutational analysis for three well-identified model proteins including Phage T4 lysozyme, Barnase and Ribonuclease HI, was performed to inspect the influence of mutating important central residues on protein activity. Copyright © 2016 Elsevier Ltd. All rights reserved.
Goonesekere, Nalin Cw
2009-01-01
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology
Sinz, Andrea; Arlt, Christian; Chorev, Dror; Sharon, Michal
2015-01-01
Mass spectrometry (MS) is becoming increasingly popular in the field of structural biology for analyzing protein three-dimensional-structures and for mapping protein–protein interactions. In this review, the specific contributions of chemical crosslinking and native MS are outlined to reveal the structural features of proteins and protein assemblies. Both strategies are illustrated based on the examples of the tetrameric tumor suppressor protein p53 and multisubunit vinculin-Arp2/3 hybrid complexes. We describe the distinct advantages and limitations of each technique and highlight synergistic effects when both techniques are combined. Integrating both methods is especially useful for characterizing large protein assemblies and for capturing transient interactions. We also point out the future directions we foresee for a combination of in vivo crosslinking and native MS for structural investigation of intact protein assemblies. PMID:25970732
MyPMFs: a simple tool for creating statistical potentials to assess protein structural models.
Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk
2018-05-29
Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs. Copyright © 2018. Published by Elsevier B.V.
DWARF – a data warehouse system for analyzing protein families
Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen
2006-01-01
Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Kirshner, Daniel A.; Nilmeier, Jerome P.; Lightstone, Felice C.
2013-01-01
The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov. PMID:23680785
Kirshner, Daniel A; Nilmeier, Jerome P; Lightstone, Felice C
2013-07-01
The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.
A tool for calculating binding-site residues on proteins from PDB structures.
Hu, Jing; Yan, Changhui
2009-08-03
In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.
Perez, Romel B.; Tischer, Alexander; Auton, Matthew; Whitten, Steven T.
2014-01-01
Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins, mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline and alanine to glycine substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (Rh) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the glycine substitutions decreased polyproline II (PPII) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in Rh were not associated with folding. The experiments showed that changes in local PPII structure cause changes in Rh that are variable and that depend on the intrinsic chain propensities of proline and alanine residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed proline and alanine effects on the structures of intrinsically disordered proteins. PMID:25244701
Relationships between residue Voronoi volume and sequence conservation in proteins.
Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung
2018-02-01
Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.
Representing and comparing protein structures as paths in three-dimensional space
Zhi, Degui; Krishna, S Sri; Cao, Haibo; Pevzner, Pavel; Godzik, Adam
2006-01-01
Background Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. Results We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. Conclusion Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver. PMID:17052359
Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*
Abrusán, György; Zhang, Yang; Szilágyi, András
2013-01-01
Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042
Zook, James D.; Molugu, Trivikram R.; Jacobsen, Neil E.; Lin, Guangxin; Soll, Jürgen; Cherry, Brian R.; Brown, Michael F.; Fromme, Petra
2013-01-01
Solving high-resolution structures for membrane proteins continues to be a daunting challenge in the structural biology community. In this study we report our high-resolution NMR results for a transmembrane protein, outer envelope protein of molar mass 16 kDa (OEP16), an amino acid transporter from the outer membrane of chloroplasts. Three-dimensional, high-resolution NMR experiments on the 13C, 15N, 2H-triply-labeled protein were used to assign protein backbone resonances and to obtain secondary structure information. The results yield over 95% assignment of N, HN, CO, Cα, and Cβ chemical shifts, which is essential for obtaining a high resolution structure from NMR data. Chemical shift analysis from the assignment data reveals experimental evidence for the first time on the location of the secondary structure elements on a per residue basis. In addition T 1Z and T2 relaxation experiments were performed in order to better understand the protein dynamics. Arginine titration experiments yield an insight into the amino acid residues responsible for protein transporter function. The results provide the necessary basis for high-resolution structural determination of this important plant membrane protein. PMID:24205117
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
Racemic & quasi-racemic protein crystallography enabled by chemical protein synthesis.
Kent, Stephen Bh
2018-04-04
A racemic protein mixture can be used to form centrosymmetric crystals for structure determination by X-ray diffraction. Both the unnatural d-protein and the corresponding natural l-protein are made by total chemical synthesis based on native chemical ligation-chemoselective condensation of unprotected synthetic peptide segments. Racemic protein crystallography is important for structure determination of the many natural protein molecules that are refractory to crystallization. Racemic mixtures facilitate the crystallization of recalcitrant proteins, and give diffraction-quality crystals. Quasi-racemic crystallization, using a single d-protein molecule, can facilitate the determination of the structures of a series of l-protein analog molecules. Copyright © 2018 Elsevier Ltd. All rights reserved.
Kim, Do Jin; Bitto, Eduard; Bingman, Craig A; Kim, Hyun-Jung; Han, Byung Woo; Phillips, George N
2015-07-01
Members of the universal stress protein (USP) family are conserved in a phylogenetically diverse range of prokaryotes, fungi, protists, and plants and confer abilities to respond to a wide range of environmental stresses. Arabidopsis thaliana contains 44 USP domain-containing proteins, and USP domain is found either in a small protein with unknown physiological function or in an N-terminal portion of a multi-domain protein, usually a protein kinase. Here, we report the first crystal structure of a eukaryotic USP-like protein encoded from the gene At3g01520. The crystal structure of the protein At3g01520 was determined by the single-wavelength anomalous dispersion method and refined to an R factor of 21.8% (Rfree = 26.1%) at 2.5 Å resolution. The crystal structure includes three At3g01520 protein dimers with one AMP molecule bound to each protomer, comprising a Rossmann-like α/β overall fold. The bound AMP and conservation of residues in the ATP-binding loop suggest that the protein At3g01520 also belongs to the ATP-binding USP subfamily members. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Bartho, Joseph D.; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O.; Zhao, Youfu; Walsh, Martin A.
2017-01-01
AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family. PMID:28426806
Bartho, Joseph D; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O; Zhao, Youfu; Walsh, Martin A; Benini, Stefano
2017-01-01
AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family.
Some of the most interesting CASP11 targets through the eyes of their authors.
Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten
2016-09-01
The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Membrane-spanning α-helical barrels as tractable protein-design targets.
Niitsu, Ai; Heal, Jack W; Fauland, Kerstin; Thomson, Andrew R; Woolfson, Derek N
2017-08-05
The rational ( de novo ) design of membrane-spanning proteins lags behind that for water-soluble globular proteins. This is due to gaps in our knowledge of membrane-protein structure, and experimental difficulties in studying such proteins compared to water-soluble counterparts. One limiting factor is the small number of experimentally determined three-dimensional structures for transmembrane proteins. By contrast, many tens of thousands of globular protein structures provide a rich source of 'scaffolds' for protein design, and the means to garner sequence-to-structure relationships to guide the design process. The α-helical coiled coil is a protein-structure element found in both globular and membrane proteins, where it cements a variety of helix-helix interactions and helical bundles. Our deep understanding of coiled coils has enabled a large number of successful de novo designs. For one class, the α-helical barrels-that is, symmetric bundles of five or more helices with central accessible channels-there are both water-soluble and membrane-spanning examples. Recent computational designs of water-soluble α-helical barrels with five to seven helices have advanced the design field considerably. Here we identify and classify analogous and more complicated membrane-spanning α-helical barrels from the Protein Data Bank. These provide tantalizing but tractable targets for protein engineering and de novo protein design.This article is part of the themed issue 'Membrane pores: from structure and assembly, to medicine and technology'. © 2017 The Author(s).
Salazar-Villanea, S; Hendriks, W H; Bruininx, E M A M; Gruppen, H; van der Poel, A F B
2016-06-01
Protein structure influences the accessibility of enzymes for digestion. The proportion of intramolecular β-sheets in the secondary structure of native proteins has been related to a decrease in protein digestibility. Changes to proteins that can be considered positive (for example, denaturation and random coil formation) or negative (for example, aggregation and Maillard reactions) for protein digestibility can occur simultaneously during processing. The final result of these changes on digestibility seems to be a counterbalance of the occurrence of each phenomenon. Occurrence of each phenomenon depends on the conditions applied, but also on the source and type of the protein that is processed. The correlation between denaturation enthalpy after processing and protein digestibility seems to be dependent on the protein source. Heat seems to be the processing parameter with the largest influence on changes in the structure of proteins. The effect of moisture is usually limited to the simultaneous application of heat, but increasing level of moisture during processing usually increases structural changes in proteins. The effect of shear on protein structure is commonly studied using extrusion, although the multifactorial essence of this technology does not allow disentanglement of the separate effects of each processing parameter (for example, heat, shear, moisture). Although most of the available literature on the processing of feed ingredients reports effects on protein digestibility, the mechanisms that explain these effects are usually lacking. Clarifying these mechanisms could aid in the prediction of the nutritional consequences of processing conditions.
Masica, David L; Ash, Jason T; Ndao, Moise; Drobny, Gary P; Gray, Jeffrey J
2010-12-08
Protein-biomineral interactions are paramount to materials production in biology, including the mineral phase of hard tissue. Unfortunately, the structure of biomineral-associated proteins cannot be determined by X-ray crystallography or solution nuclear magnetic resonance (NMR). Here we report a method for determining the structure of biomineral-associated proteins. The method combines solid-state NMR (ssNMR) and ssNMR-biased computational structure prediction. In addition, the algorithm is able to identify lattice geometries most compatible with ssNMR constraints, representing a quantitative, novel method for investigating crystal-face binding specificity. We use this method to determine most of the structure of human salivary statherin interacting with the mineral phase of tooth enamel. Computation and experiment converge on an ensemble of related structures and identify preferential binding at three crystal surfaces. The work represents a significant advance toward determining structure of biomineral-adsorbed protein using experimentally biased structure prediction. This method is generally applicable to proteins that can be chemically synthesized. Copyright © 2010 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Robic, Srebrenka
2010-01-01
To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative…
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Bhagavat, Raghu; Sankar, Santhosh; Srinivasan, Narayanaswamy; Chandra, Nagasuma
2018-03-06
Protein-ligand interactions form the basis of most cellular events. Identifying ligand binding pockets in proteins will greatly facilitate rationalizing and predicting protein function. Ligand binding sites are unknown for many proteins of known three-dimensional (3D) structure, creating a gap in our understanding of protein structure-function relationships. To bridge this gap, we detect pockets in proteins of known 3D structures, using computational techniques. This augmented pocketome (PocketDB) consists of 249,096 pockets, which is about seven times larger than what is currently known. We deduce possible ligand associations for about 46% of the newly identified pockets. The augmented pocketome, when subjected to clustering based on similarities among pockets, yielded 2,161 site types, which are associated with 1,037 ligand types, together providing fold-site-type-ligand-type associations. The PocketDB resource facilitates a structure-based function annotation, delineation of the structural basis of ligand recognition, and provides functional clues for domains of unknown functions, allosteric proteins, and druggable pockets. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein structure determination by exhaustive search of Protein Data Bank derived databases.
Stokes-Rees, Ian; Sliz, Piotr
2010-12-14
Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
A discrete search algorithm for finding the structure of protein backbones and side chains.
Sallaume, Silas; Martins, Simone de Lima; Ochi, Luiz Satoru; Da Silva, Warley Gramacho; Lavor, Carlile; Liberti, Leo
2013-01-01
Some information about protein structure can be obtained by using Nuclear Magnetic Resonance (NMR) techniques, but they provide only a sparse set of distances between atoms in a protein. The Molecular Distance Geometry Problem (MDGP) consists in determining the three-dimensional structure of a molecule using a set of known distances between some atoms. Recently, a Branch and Prune (BP) algorithm was proposed to calculate the backbone of a protein, based on a discrete formulation for the MDGP. We present an extension of the BP algorithm that can calculate not only the protein backbone, but the whole three-dimensional structure of proteins.
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
Freezing-Induced Perturbation of Tertiary Structure of a Monoclonal Antibody
LIU, LU; BRAUN, LATOYA JONES; WANG, WEI; RANDOLPH, THEODORE W.; CARPENTER, JOHN F.
2014-01-01
We studied the effects of pH and solution additives on freezing-induced perturbations in the tertiary structure of a monoclonal antibody (mAb) by intrinsic tryptophan fluorescence spectroscopy. In general, freezing caused perturbations in the tertiary structure of the mAb, which were reversible or irreversible depending on the pH or excipients present in the formulation. Protein aggregation occurred in freeze–thawed samples in which perturbations of the tertiary structure were observed, but the levels of protein aggregates formed were not proportional to the degree of structural perturbation. Protein aggregation also occurred in freeze–thawed samples without obvious structural perturbations, most likely because of freeze concentration of protein and salts, and thus reduced protein colloidal stability. Therefore, freezing-induced protein aggregation may or may not first involve the perturbation of its native structure, followed by the assembly processes to form aggregates. Depending on the solution conditions, either step can be rate limiting. Finally, this study demonstrates the potential of fluorescence spectroscopy as a valuable tool for screening therapeutic protein formulations subjected to freeze–thaw stress. PMID:24832730
Su, Qiudong; Guo, Minzhuo; Jia, Zhiyuan; Qiu, Feng; Lu, Xuexin; Gao, Yan; Meng, Qingling; Tian, Ruiguang; Bi, Shengli; Yi, Yao
2016-07-01
Hepatitis A virus (HAV) infection can stimulate the production of antibodies to structural and non-structural proteins of the virus. However, vaccination with an inactivated or attenuated HAV vaccine produces antibodies mainly against structural proteins, whereas no or very limited antibodies are produced against the non-structural proteins. Current diagnostic assays to determine exposure to HAV, such as the Abbott HAV AB test, detect antibodies only to the structural proteins and so are not able to distinguish a natural infection from vaccination with an inactivated or attenuated virus. Here, we constructed a recombinant tandem multi-epitope diagnostic antigen (designated 'H1') based on the immune-dominant epitopes of the non-structural proteins of HAV to distinguish the two situations. H1 protein expressed in Escherichia coli and purified by affinity and anion exchange chromatography was applied in a double-antigen sandwich ELISA for the detection of anti-non-structural HAV proteins, which was confirmed to distinguish a natural infection from vaccination with an inactivated or attenuated HAV vaccine. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative Protein Structure Modeling Using MODELLER.
Webb, Benjamin; Sali, Andrej
2014-09-08
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.
Self-assembly in the ferritin nano-cage protein superfamily.
Zhang, Yu; Orner, Brendan P
2011-01-01
Protein self-assembly, through specific, high affinity, and geometrically constraining protein-protein interactions, can control and lead to complex cellular nano-structures. Establishing an understanding of the underlying principles that govern protein self-assembly is not only essential to appreciate the fundamental biological functions of these structures, but could also provide a basis for their enhancement for nano-material applications. The ferritins are a superfamily of well studied proteins that self-assemble into hollow cage-like structures which are ubiquitously found in both prokaryotes and eukaryotes. Structural studies have revealed that many members of the ferritin family can self-assemble into nano-cages of two types. Maxi-ferritins form hollow spheres with octahedral symmetry composed of twenty-four monomers. Mini-ferritins, on the other hand, are tetrahedrally symmetric, hollow assemblies composed of twelve monomers. This review will focus on the structure of members of the ferritin superfamily, the mechanism of ferritin self-assembly and the structure-function relations of these proteins.
Automated structure determination of proteins with the SAIL-FLYA NMR method.
Takeda, Mitsuhiro; Ikeya, Teppei; Güntert, Peter; Kainosho, Masatsune
2007-01-01
The labeling of proteins with stable isotopes enhances the NMR method for the determination of 3D protein structures in solution. Stereo-array isotope labeling (SAIL) provides an optimal stereospecific and regiospecific pattern of stable isotopes that yields sharpened lines, spectral simplification without loss of information, and the ability to collect rapidly and evaluate fully automatically the structural restraints required to solve a high-quality solution structure for proteins up to twice as large as those that can be analyzed using conventional methods. Here, we describe a protocol for the preparation of SAIL proteins by cell-free methods, including the preparation of S30 extract and their automated structure analysis using the FLYA algorithm and the program CYANA. Once efficient cell-free expression of the unlabeled or uniformly labeled target protein has been achieved, the NMR sample preparation of a SAIL protein can be accomplished in 3 d. A fully automated FLYA structure calculation can be completed in 1 d on a powerful computer system.
Nakamura, Akira; Ohtsuka, Jun; Kashiwagi, Tatsuki; Numoto, Nobutaka; Hirota, Noriyuki; Ode, Takahiro; Okada, Hidehiko; Nagata, Koji; Kiyohara, Motosuke; Suzuki, Ei-Ichiro; Kita, Akiko; Wada, Hitoshi; Tanokura, Masaru
2016-02-26
Precise protein structure determination provides significant information on life science research, although high-quality crystals are not easily obtained. We developed a system for producing high-quality protein crystals with high throughput. Using this system, gravity-controlled crystallization are made possible by a magnetic microgravity environment. In addition, in-situ and real-time observation and time-lapse imaging of crystal growth are feasible for over 200 solution samples independently. In this paper, we also report results of crystallization experiments for two protein samples. Crystals grown in the system exhibited magnetic orientation and showed higher and more homogeneous quality compared with the control crystals. The structural analysis reveals that making use of the magnetic microgravity during the crystallization process helps us to build a well-refined protein structure model, which has no significant structural differences with a control structure. Therefore, the system contributes to improvement in efficiency of structural analysis for "difficult" proteins, such as membrane proteins and supermolecular complexes.
Pineda-Lucena, Antonio; Liao, Jack C C; Cort, John R; Yee, Adelinda; Kennedy, Michael A; Edwards, Aled M; Arrowsmith, Cheryl H
2003-05-01
As part of the Northeast Structural Genomics Consortium pilot project focused on small eukaryotic proteins and protein domains, we have determined the NMR structure of the protein encoded by ORF YML108W from Saccharomyces cerevisiae. YML108W belongs to one of the numerous structural proteomics targets whose biological function is unknown. Moreover, this protein does not have sequence similarity to any other protein. The NMR structure of YML108W consists of a four-stranded beta-sheet with strand order 2143 and two alpha-helices, with an overall topology of betabetaalphabetabetaalpha. Strand beta1 runs parallel to beta4, and beta2:beta1 and beta4:beta3 pairs are arranged in an antiparallel fashion. Although this fold belongs to the split betaalphabeta family, it appears to be unique among this family; it is a novel arrangement of secondary structure, thereby expanding the universe of protein folds.
Zhang, Yanpeng; Yang, Ruijin; Zhang, Weinong; Hu, Zhixiong; Zhao, Wei
2017-03-15
The aim of this work was to analyze the influence of steam flash-explosion (SFE) with dilute acid soaking pretreatment on the structural characteristics and physiochemical properties of protein from soybean meal (SBM). The pretreatment led to depolymerisation of soy protein isolate (SPI) and formation of new protein aggregation through non-disulfide covalent bonds, which resulted in broader MW distribution of SPI. The analysis of CD spectroscopy showed that the SFE treatment induced minor changes in secondary structure, however, the intrinsic tryptophan fluorescence revealed that acid soaking and SFE treatment pronouncedly altered the tertiary structure of SPI. The protein zeta potential was shown to be increased after SFE treatment attributed to the changes in protein structure and the covalent coupling between carbohydrate and protein. These results contribute to clarifying the mechanisms of the effect of pretreatment on SPI structure, thus moving further toward implementing SFE in the processing chain of SPI. Copyright © 2016 Elsevier Ltd. All rights reserved.
Budowski-Tal, Inbal; Nov, Yuval; Kolodny, Rachel
2010-02-23
Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.
My 65 years in protein chemistry.
Scheraga, Harold A
2015-05-01
This is a tour of a physical chemist through 65 years of protein chemistry from the time when emphasis was placed on the determination of the size and shape of the protein molecule as a colloidal particle, with an early breakthrough by James Sumner, followed by Linus Pauling and Fred Sanger, that a protein was a real molecule, albeit a macromolecule. It deals with the recognition of the nature and importance of hydrogen bonds and hydrophobic interactions in determining the structure, properties, and biological function of proteins until the present acquisition of an understanding of the structure, thermodynamics, and folding pathways from a linear array of amino acids to a biological entity. Along the way, with a combination of experiment and theoretical interpretation, a mechanism was elucidated for the thrombin-induced conversion of fibrinogen to a fibrin blood clot and for the oxidative-folding pathways of ribonuclease A. Before the atomic structure of a protein molecule was determined by x-ray diffraction or nuclear magnetic resonance spectroscopy, experimental studies of the fundamental interactions underlying protein structure led to several distance constraints which motivated the theoretical approach to determine protein structure, and culminated in the Empirical Conformational Energy Program for Peptides (ECEPP), an all-atom force field, with which the structures of fibrous collagen-like proteins and the 46-residue globular staphylococcal protein A were determined. To undertake the study of larger globular proteins, a physics-based coarse-grained UNited-RESidue (UNRES) force field was developed, and applied to the protein-folding problem in terms of structure, thermodynamics, dynamics, and folding pathways. Initially, single-chain and, ultimately, multiple-chain proteins were examined, and the methodology was extended to protein-protein interactions and to nucleic acids and to protein-nucleic acid interactions. The ultimate results led to an understanding of a variety of biological processes underlying natural and disease phenomena.
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins
NASA Astrophysics Data System (ADS)
Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong
2016-04-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Structure Prediction and Analysis of Neuraminidase Sequence Variants
ERIC Educational Resources Information Center
Thayer, Kelly M.
2016-01-01
Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…
The structure of a cholesterol-trapping protein
Date February 28, 2003 Date Berkeley Lab Science Beat Berkeley Lab Science Beat The structure of a Institute researchers determined the three-dimensional structure of a protein that controls cholesterol level in the bloodstream. Knowing the structure of the protein, a cellular receptor that ensnares
Protein local structure alignment under the discrete Fréchet distance.
Zhu, Binhai
2007-12-01
Protein structure alignment is a fundamental problem in computational and structural biology. While there has been lots of experimental/heuristic methods and empirical results, very few results are known regarding the algorithmic/complexity aspects of the problem, especially on protein local structure alignment. A well-known measure to characterize the similarity of two polygonal chains is the famous Fréchet distance, and with the application of protein-related research, a related discrete Fréchet distance has been used recently. In this paper, following the recent work of Jiang et al. we investigate the protein local structural alignment problem using bounded discrete Fréchet distance. Given m proteins (or protein backbones, which are 3D polygonal chains), each of length O(n), our main results are summarized as follows: * If the number of proteins, m, is not part of the input, then the problem is NP-complete; moreover, under bounded discrete Fréchet distance it is NP-hard to approximate the maximum size common local structure within a factor of n(1-epsilon). These results hold both when all the proteins are static and when translation/rotation are allowed. * If the number of proteins, m, is a constant, then there is a polynomial time solution for the problem.
DNAproDB: an interactive tool for structural analysis of DNA–protein complexes
Sagendorf, Jared M.
2017-01-01
Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131
Rysavy, Steven J; Beck, David A C; Daggett, Valerie
2014-11-01
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.
Bioinformatics and variability in drug response: a protein structural perspective
Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.
2012-01-01
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919
Li de La Sierra-Gallay, Ines; Collinet, Bruno; Graille, Marc; Quevillon-Cheruel, Sophie; Liger, Dominique; Minard, Philippe; Blondeau, Karine; Henckes, Gilles; Aufrère, Robert; Leulliot, Nicolas; Zhou, Cong-Zhao; Sorel, Isabelle; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman
2004-03-01
The protein product of the YGR205w gene of Saccharomyces cerevisiae was targeted as part of our yeast structural genomics project. YGR205w codes for a small (290 amino acids) protein with unknown structure and function. The only recognizable sequence feature is the presence of a Walker A motif (P loop) indicating a possible nucleotide binding/converting function. We determined the three-dimensional crystal structure of Se-methionine substituted protein using multiple anomalous diffraction. The structure revealed a well known mononucleotide fold and strong resemblance to the structure of small metabolite phosphorylating enzymes such as pantothenate and phosphoribulo kinase. Biochemical experiments show that YGR205w binds specifically ATP and, less tightly, ADP. The structure also revealed the presence of two bound sulphate ions, occupying opposite niches in a canyon that corresponds to the active site of the protein. One sulphate is bound to the P-loop in a position that corresponds to the position of beta-phosphate in mononucleotide protein ATP complex, suggesting the protein is indeed a kinase. The nature of the phosphate accepting substrate remains to be determined. Copyright 2004 Wiley-Liss, Inc.
2016-01-01
Many excellent methods exist that incorporate cryo-electron microscopy (cryoEM) data to constrain computational protein structure prediction and refinement. Previously, it was shown that iteration of two such orthogonal sampling and scoring methods – Rosetta and molecular dynamics (MD) simulations – facilitated exploration of conformational space in principle. Here, we go beyond a proof-of-concept study and address significant remaining limitations of the iterative MD–Rosetta protein structure refinement protocol. Specifically, all parts of the iterative refinement protocol are now guided by medium-resolution cryoEM density maps, and previous knowledge about the native structure of the protein is no longer necessary. Models are identified solely based on score or simulation time. All four benchmark proteins showed substantial improvement through three rounds of the iterative refinement protocol. The best-scoring final models of two proteins had sub-Ångstrom RMSD to the native structure over residues in secondary structure elements. Molecular dynamics was most efficient in refining secondary structure elements and was thus highly complementary to the Rosetta refinement which is most powerful in refining side chains and loop regions. PMID:25883538
Ramya, L; Gautham, N; Chaloin, Laurent; Kajava, Andrey V
2015-09-01
Significant progress has been made in the determination of the protein structures with their number today passing over a hundred thousand structures. The next challenge is the understanding and prediction of protein-protein and protein-ligand interactions. In this work we address this problem by analyzing curved solenoid proteins. Many of these proteins are considered as "hub molecules" for their high potential to interact with many different molecules and to be a scaffold for multisubunit protein machineries. Our analysis of these structures through molecular dynamics simulations reveals that the mobility of the side-chains on the concave surfaces of the solenoids is lower than on the convex ones. This result provides an explanation to the observed preferential binding of the ligands, including small and flexible ligands, to the concave surface of the curved solenoid proteins. The relationship between the landscapes and dynamic properties of the protein surfaces can be further generalized to the other types of protein structures and eventually used in the computer algorithms, allowing prediction of protein-ligand interactions by analysis of protein surfaces. © 2015 Wiley Periodicals, Inc.
Yamaguchi, Akihiro; Go, Mitiko
2006-01-01
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617
Bio-Organic Nanotechnology: Using Proteins and Synthetic Polymers for Nanoscale Devices
NASA Technical Reports Server (NTRS)
Molnar, Linda K.; Xu, Ting; Trent, Jonathan D.; Russell, Thomas P.
2003-01-01
While the ability of proteins to self-assemble makes them powerful tools in nanotechnology, in biological systems protein-based structures ultimately depend on the context in which they form. We combine the self-assembling properties of synthetic diblock copolymers and proteins to construct intricately ordered, three-dimensional polymer protein structures with the ultimate goal of forming nano-scale devices. This hybrid approach takes advantage of the capabilities of organic polymer chemistry to build ordered structures and the capabilities of genetic engineering to create proteins that are selective for inorganic or organic substrates. Here, microphase-separated block copolymers coupled with genetically engineered heat shock proteins are used to produce nano-scale patterning that maximizes the potential for both increased structural complexity and integrity.
Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.
Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka
Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.
A new definition and properties of the similarity value between two protein structures.
Saberi Fathi, S M
2016-10-01
Knowledge regarding the 3D structure of a protein provides useful information about the protein's functional properties. Particularly, structural similarity between proteins can be used as a good predictor of functional similarity. One method that uses the 3D geometrical structure of proteins in order to compare them is the similarity value (SV). In this paper, we introduce a new definition of the SV measure for comparing two proteins. To this end, we consider the mass of the protein's atoms and concentrate on the number of protein's atoms to be compared. This defines a new measure, called the weighted similarity value (WSV), adding physical properties to geometrical properties. We also show that our results are in good agreement with the results obtained by TM-SCORE and DALILITE. WSV can be of use in protein classification and in drug discovery.
DNA mimic proteins: functions, structures, and bioinformatic analysis.
Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J
2014-05-13
DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.
Proteins as sponges: a statistical journey along protein structure organization principles.
Paola, Luisa Di; Paci, Paola; Santoni, Daniele; Ruvo, Micol De; Giuliani, Alessandro
2012-02-27
The analysis of a large database of protein structures by means of topological and shape indexes inspired by complex network and fractal analysis shed light on some organizational principles of proteins. Proteins appear much more similar to "fractal" sponges than to closely packed spheres, casting doubts on the tenability of the hydrophobic core concept. Principal component analysis highlighted three main order parameters shaping the protein universe: (1) "size", with the consequent generation of progressively less dense and more empty structures at an increasing number of residues, (2) "microscopic structuring", linked to the existence of a spectrum going from the prevalence of heterologous (different hydrophobicity) to the prevalence of homologous (similar hydrophobicity) contacts, and (3) "fractal shape", an organizing protein data set along a continuum going from approximately linear to very intermingled structures. Perhaps the time has come for seriously taking into consideration the real relevance of time-honored principles like the hydrophobic core and hydrophobic effect.
Origins of structure in globular proteins.
Chan, H S; Dill, K A
1990-01-01
The principal forces of protein folding--hydrophobicity and conformational entropy--are nonspecific. A long-standing puzzle has, therefore, been: What forces drive the formation of the specific internal architectures in globular proteins? We find that any self-avoiding flexible polymer molecule will develop large amounts of secondary structure, helices and parallel and antiparallel sheets, as it is driven to increasing compactness by any force of attraction among the chain monomers. Thus structure formation arises from the severity of steric constraints in compact polymers. This steric principle of organization can account for why short helices are stable in globular proteins, why there are parallel and anti-parallel sheets in proteins, and why weakly unfolded proteins have some secondary structure. On this basis, it should be possible to construct copolymers, not necessarily using amino acids, that can collapse to maximum compactness in incompatible solvents and that should then have structural organization resembling that of proteins. Images PMID:2385597
Mixing and Matching Detergents for Membrane Protein NMR Structure Determination
DOE Office of Scientific and Technical Information (OSTI.GOV)
Columbus, Linda; Lipfert, Jan; Jambunathan, Kalyani
2009-10-21
One major obstacle to membrane protein structure determination is the selection of a detergent micelle that mimics the native lipid bilayer. Currently, detergents are selected by exhaustive screening because the effects of protein-detergent interactions on protein structure are poorly understood. In this study, the structure and dynamics of an integral membrane protein in different detergents is investigated by nuclear magnetic resonance (NMR) and electron paramagnetic resonance (EPR) spectroscopy and small-angle X-ray scattering (SAXS). The results suggest that matching of the micelle dimensions to the protein's hydrophobic surface avoids exchange processes that reduce the completeness of the NMR observations. Based onmore » these dimensions, several mixed micelles were designed that improved the completeness of NMR observations. These findings provide a basis for the rational design of mixed micelles that may advance membrane protein structure determination by NMR.« less
Detection of functionally important regions in "hypothetical proteins" of known structure.
Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir
2008-12-10
Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Heterogeneous nucleation of hydroxyapatite on protein: structural effect of silk sericin
Takeuchi, Akari; Ohtsuki, Chikara; Miyazaki, Toshiki; Kamitakahara, Masanobu; Ogata, Shin-ichi; Yamazaki, Masao; Furutani, Yoshiaki; Kinoshita, Hisao; Tanihara, Masao
2005-01-01
Acidic proteins play an important role during mineral formation in biological systems, but the mechanism of mineral formation is far from understood. In this paper, we report on the relationship between the structure of a protein and hydroxyapatite deposition under biomimetic conditions. Sericin, a type of silk protein, was adopted as a suitable protein for studying structural effect on hydroxyapatite deposition, since it forms a hydroxyapatite layer on its surface in a metastable calcium phosphate solution, and its structure has been reported. Sericin effectively induced hydroxyapatite nucleation when it has high molecular weight and a β sheet structure. This indicates that the specific structure of a protein can effectively induce heterogeneous nucleation of hydroxyapatite in a biomimetic solution, i.e. a metastable calcium phosphate solution. This finding is useful in understanding biomineralization, as well as for the design of organic polymers that can effectively induce hydroxyapatite nucleation. PMID:16849195
Thermostabilisation of membrane proteins for structural studies
Magnani, Francesca; Serrano-Vega, Maria J.; Shibata, Yoko; Abdul-Hussein, Saba; Lebon, Guillaume; Miller-Gallacher, Jennifer; Singhal, Ankita; Strege, Annette; Thomas, Jennifer A.; Tate, Christopher G.
2017-01-01
The thermostability of an integral membrane protein in detergent solution is a key parameter that dictates the likelihood of obtaining well-diffracting crystals suitable for structure determination. However, many mammalian membrane proteins are too unstable for crystallisation. We developed a thermostabilisation strategy based on systematic mutagenesis coupled to a radioligand-binding thermostability assay that can be applied to receptors, ion channels and transporters. It takes approximately 6-12 months to thermostabilise a G protein-coupled receptor (GPCR) containing 300 amino acid residues. The resulting thermostabilised membrane proteins are more easily crystallised and result in high-quality structures. This methodology has facilitated structure-based drug design applied to GPCRs, because it is possible to determine multiple structures of the thermostabilised receptors bound to low affinity ligands. Protocols and advice are given on how to develop thermostability assays for membrane proteins and how to combine mutations to make an optimally stable mutant suitable for structural studies. PMID:27466713
Identifying DNA-binding proteins using structural motifs and the electrostatic potential
Shanahan, Hugh P.; Garcia, Mario A.; Jones, Susan; Thornton, Janet M.
2004-01-01
Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix–turn-helix (HTH), helix–hairpin–helix (HhH) and helix–loop–helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif. PMID:15356290
Structure of synaptophysin: a hexameric MARVEL-domain channel protein.
Arthur, Christopher P; Stowell, Michael H B
2007-06-01
Synaptophysin I (SypI) is an archetypal member of the MARVEL-domain family of integral membrane proteins and one of the first synaptic vesicle proteins to be identified and cloned. Most all MARVEL-domain proteins are involved in membrane apposition and vesicle-trafficking events, but their precise role in these processes is unclear. We have purified mammalian SypI and determined its three-dimensional (3D) structure by using electron microscopy and single-particle 3D reconstruction. The hexameric structure resembles an open basket with a large pore and tenuous interactions within the cytosolic domain. The structure suggests a model for Synaptophysin's role in fusion and recycling that is regulated by known interactions with the SNARE machinery. This 3D structure of a MARVEL-domain protein provides a structural foundation for understanding the role of these important proteins in a variety of biological processes.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Segmented molecular design of self-healing proteinaceous materials
NASA Astrophysics Data System (ADS)
Sariola, Veikko; Pena-Francesch, Abdon; Jung, Huihun; Çetinkaya, Murat; Pacheco, Carlos; Sitti, Metin; Demirel, Melik C.
2015-09-01
Hierarchical assembly of self-healing adhesive proteins creates strong and robust structural and interfacial materials, but understanding of the molecular design and structure-property relationships of structural proteins remains unclear. Elucidating this relationship would allow rational design of next generation genetically engineered self-healing structural proteins. Here we report a general self-healing and -assembly strategy based on a multiphase recombinant protein based material. Segmented structure of the protein shows soft glycine- and tyrosine-rich segments with self-healing capability and hard beta-sheet segments. The soft segments are strongly plasticized by water, lowering the self-healing temperature close to body temperature. The hard segments self-assemble into nanoconfined domains to reinforce the material. The healing strength scales sublinearly with contact time, which associates with diffusion and wetting of autohesion. The finding suggests that recombinant structural proteins from heterologous expression have potential as strong and repairable engineering materials.
Segmented molecular design of self-healing proteinaceous materials.
Sariola, Veikko; Pena-Francesch, Abdon; Jung, Huihun; Çetinkaya, Murat; Pacheco, Carlos; Sitti, Metin; Demirel, Melik C
2015-09-01
Hierarchical assembly of self-healing adhesive proteins creates strong and robust structural and interfacial materials, but understanding of the molecular design and structure-property relationships of structural proteins remains unclear. Elucidating this relationship would allow rational design of next generation genetically engineered self-healing structural proteins. Here we report a general self-healing and -assembly strategy based on a multiphase recombinant protein based material. Segmented structure of the protein shows soft glycine- and tyrosine-rich segments with self-healing capability and hard beta-sheet segments. The soft segments are strongly plasticized by water, lowering the self-healing temperature close to body temperature. The hard segments self-assemble into nanoconfined domains to reinforce the material. The healing strength scales sublinearly with contact time, which associates with diffusion and wetting of autohesion. The finding suggests that recombinant structural proteins from heterologous expression have potential as strong and repairable engineering materials.
Mapping monomeric threading to protein-protein structure prediction.
Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang
2013-03-25
The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.
Single-Molecule Microscopy and Force Spectroscopy of Membrane Proteins
NASA Astrophysics Data System (ADS)
Engel, Andreas; Janovjak, Harald; Fotiadis, Dimtrios; Kedrov, Alexej; Cisneros, David; Müller, Daniel J.
Single-molecule atomic force microscopy (AFM) provides novel ways to characterize the structure-function relationship of native membrane proteins. High-resolution AFM topographs allow observing the structure of single proteins at sub-nanometer resolution as well as their conformational changes, oligomeric state, molecular dynamics and assembly. We will review these feasibilities illustrating examples of membrane proteins in native and reconstituted membranes. Classification of individual topographs of single proteins allows understanding the principles of motions of their extrinsic domains, to learn about their local structural flexibilities and to find the entropy minima of certain conformations. Combined with the visualization of functionally related conformational changes these insights allow understanding why certain flexibilities are required for the protein to function and how structurally flexible regions allow certain conformational changes. Complementary to AFM imaging, single-molecule force spectroscopy (SMFS) experiments detect molecular interactions established within and between membrane proteins. The sensitivity of this method makes it possible to measure interactions that stabilize secondary structures such as transmembrane α-helices, polypeptide loops and segments within. Changes in temperature or protein-protein assembly do not change the locations of stable structural segments, but influence their stability established by collective molecular interactions. Such changes alter the probability of proteins to choose a certain unfolding pathway. Recent examples have elucidated unfolding and refolding pathways of membrane proteins as well as their energy landscapes.
Leite, Wellington C; Galvão, Carolina W; Saab, Sérgio C; Iulek, Jorge; Etto, Rafael M; Steffens, Maria B R; Chitteni-Pattu, Sindhu; Stanage, Tyler; Keck, James L; Cox, Michael M
2016-01-01
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminal polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. Our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament.
Ladunga, I
1992-04-01
The markedly nonuniform, even systematic distribution of sequences in the protein "universe" has been analyzed by methods of protein taxonomy. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of proteins that seem to be natural structural groupings, possibly seeds for a future protein taxonomy. The aim was not to force proteins into more or less man-made categories by discriminant analysis, but to find structurally similar groups, possibly of common evolutionary origin. Single-valued distance measures between pairs of superfamilies from the Protein Identification Resource were defined by two chi 2-like methods on tripeptide frequencies and the variable-length subsequence identity method derived from dot-matrix comparisons. Distance matrices were processed by several methods of cluster analysis to detect phylogenetic continuum between highly divergent proteins. Only well-defined clusters characterized by relatively unique structural, intracellular environmental, organismal, and functional attribute states were selected as major protein groups, including subsets of viral and Escherichia coli proteins, hormones, inhibitors, plant, ribosomal, serum and structural proteins, amino acid synthases, and clusters dominated by certain oxidoreductases and apolar and DNA-associated enzymes. The limited repertoire of functional patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle canalize certain proteins of viruses and Gram-negative bacteria, respectively, to organismal groups.
A scoring function based on solvation thermodynamics for protein structure prediction
Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru
2012-01-01
We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529
Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L
2011-06-02
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
A Generative Angular Model of Protein Structure Evolution
Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun
2017-01-01
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.
Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro
2016-09-01
The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E
2018-05-16
Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.
Munteanu, Cristian R; Pedreira, Nieves; Dorado, Julián; Pazos, Alejandro; Pérez-Montoto, Lázaro G; Ubeira, Florencio M; González-Díaz, Humberto
2014-04-01
Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A Web-Accessible Protein Structure Prediction Pipeline
2009-06-01
Abstract Proteins are the molecular basis of nearly all structural, catalytic, sensory, and regulatory functions in living organisms. The biological...sensory, and regulatory functions in living organisms. The structure of a protein is essential in understanding its function at the molecular level...Characterizing sequence-structure and structure-function relationships have been the goals of molecular biology for more than three decades
Amyloid formation and inhibition of an all-beta protein: A study on fungal polygalacturonase
NASA Astrophysics Data System (ADS)
Chinisaz, Maryam; Ghasemi, Atiyeh; Larijani, Bagher; Ebrahim-Habibi, Azadeh
2014-02-01
Theoretically, all proteins can adopt the nanofibrillar structures known as amyloid, which contain cross-beta structures. The all-beta folded proteins are particularly interesting in this regard, since they appear to be naturally more predisposed toward this structural arrangement. In this study, methanol has been used to drive the beta-helix protein polygalacturonase (PG), toward amyloid fibril formation. Congo red absorbance, thioflavin T fluorescence, circular dichroism (CD) and transmission electron microscopy have been used to characterize this process. Similar to other all-beta proteins, PG shows a non-cooperative fibrillation mechanism, but the structural changes that are monitored by CD indicate a different pattern. Furthermore, several compounds containing aromatic components were tested as potential inhibitors of amyloid formation. Another protein predominantly composed of alpha-helices (human serum albumin) was also targeted by these ligands, in order to get an insight into their potential anti-aggregation property toward structurally different proteins. Among tested compounds, silibinin and chlorpropamide were able to considerably affect both proteins fibrillation process.
PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.
Mezulis, Stefans; Sternberg, Michael J E; Kelley, Lawrence A
2016-02-22
The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score>0.7) structures in the PDB for each query structure, usually in less than 60s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Gifford, Lida K; Carter, Lester G; Gabanyi, Margaret J; Berman, Helen M; Adams, Paul D
2012-06-01
The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.
Membrane protein structure determination — The next generation☆☆☆
Moraes, Isabel; Evans, Gwyndaf; Sanchez-Weatherby, Juan; Newstead, Simon; Stewart, Patrick D. Shaw
2014-01-01
The field of Membrane Protein Structural Biology has grown significantly since its first landmark in 1985 with the first three-dimensional atomic resolution structure of a membrane protein. Nearly twenty-six years later, the crystal structure of the beta2 adrenergic receptor in complex with G protein has contributed to another landmark in the field leading to the 2012 Nobel Prize in Chemistry. At present, more than 350 unique membrane protein structures solved by X-ray crystallography (http://blanco.biomol.uci.edu/mpstruc/exp/list, Stephen White Lab at UC Irvine) are available in the Protein Data Bank. The advent of genomics and proteomics initiatives combined with high-throughput technologies, such as automation, miniaturization, integration and third-generation synchrotrons, has enhanced membrane protein structure determination rate. X-ray crystallography is still the only method capable of providing detailed information on how ligands, cofactors, and ions interact with proteins, and is therefore a powerful tool in biochemistry and drug discovery. Yet the growth of membrane protein crystals suitable for X-ray diffraction studies amazingly remains a fine art and a major bottleneck in the field. It is often necessary to apply as many innovative approaches as possible. In this review we draw attention to the latest methods and strategies for the production of suitable crystals for membrane protein structure determination. In addition we also highlight the impact that third-generation synchrotron radiation has made in the field, summarizing the latest strategies used at synchrotron beamlines for screening and data collection from such demanding crystals. This article is part of a Special Issue entitled: Structural and biophysical characterisation of membrane protein-ligand binding. PMID:23860256
D'Antonio, Matteo; Masseroli, Marco
2009-01-01
Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075
Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?
Skolnick, Jeffrey; Zhou, Hongyi
2017-04-20
Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
G2S: a web-service for annotating genomic variants on 3D protein structures.
Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong
2018-06-01
Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
Jensen, Malene Ringkjøbing; Bernadó, Pau; Houben, Klaartje; Blanchard, Laurence; Marion, Dominque; Ruigrok, Rob W H; Blackledge, Martin
2010-08-01
Intrinsically disordered regions of significant length are present throughout eukaryotic genomes, and are particularly prevalent in viral proteins. Due to their inherent flexibility, these proteins inhabit a conformational landscape that is too complex to be described by classical structural biology. The elucidation of the role that conformational flexibility plays in molecular function will redefine our understanding of the molecular basis of biological function, and the development of appropriate technology to achieve this aim remains one of the major challenges for the future of structural biology. NMR is the technique of choice for studying intrinsically disordered proteins, providing information about structure, flexibility and interactions at atomic resolution even in completely disordered proteins. In particular residual dipolar couplings (RDCs) are sensitive and powerful tools for determining local and long-range structural behaviour in flexible proteins. Here we describe recent applications of the use of RDCs to quantitatively describe the level of local structure in intrinsically disordered proteins involved in replication and transcription in Sendai virus.
Compton, L A; Johnson, W C
1986-05-15
Inverse circular dichroism (CD) spectra are presented for each of the five major secondary structures of proteins: alpha-helix, antiparallel and parallel beta-sheet, beta-turn, and other (random) structures. The fraction of the each secondary structure in a protein is predicted by forming the dot product of the corresponding inverse CD spectrum, expressed as a vector, with the CD spectrum of the protein digitized in the same way. We show how this method is based on the construction of the generalized inverse from the singular value decomposition of a set of CD spectra corresponding to proteins whose secondary structures are known from X-ray crystallography. These inverse spectra compute secondary structure directly from protein CD spectra without resorting to least-squares fitting and standard matrix inversion techniques. In addition, spectra corresponding to the individual secondary structures, analogous to the CD spectra of synthetic polypeptides, are generated from the five most significant CD eigenvectors.
Structural Genomics and Drug Discovery for Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, W.F.
The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging,more » or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.« less
An estimated 5% of new protein structures solved today represent a new Pfam family
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mistry, Jaina; Kloppmann, Edda; Rost, Burkhard
2013-11-01
This study uses the Pfam database to show that the sequence redundancy of protein structures deposited in the PDB is increasing. The possible reasons behind this trend are discussed. High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquiredmore » their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.« less
Mining protein loops using a structural alphabet and statistical exceptionality
2010-01-01
Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/. PMID:20132552
Mining protein loops using a structural alphabet and statistical exceptionality.
Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude
2010-02-04
Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies
Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.
2014-01-01
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.
Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas
2014-01-01
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
ERIC Educational Resources Information Center
Giron, Maria D.; Salto, Rafael
2011-01-01
Structure-function relationship studies in proteins are essential in modern Cell Biology. Laboratory exercises that allow students to familiarize themselves with basic mutagenesis techniques are essential in all Genetic Engineering courses to teach the relevance of protein structure. We have implemented a laboratory course based on the…
Protein Models Docking Benchmark 2
Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.
2015-01-01
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
The Structure and Function of Non-Collagenous Bone Proteins
NASA Technical Reports Server (NTRS)
Hook, Magnus; McQuillan, David J.
1997-01-01
The research done under the cooperative research agreement for the project titled 'The structure and function of non-collagenous bone proteins' represented the first phase of an ongoing program to define the structural and functional relationships of the principal noncollagenous proteins in bone. An ultimate goal of this research is to enable design and execution of useful pharmacological compounds that will have a beneficial effect in treatment of osteoporosis, both land-based and induced by long-duration space travel. The goals of the now complete first phase were as follows: 1. Establish and/or develop powerful recombinant protein expression systems; 2. Develop and refine isolation and purification of recombinant proteins; 3. Express wild-type non-collagenous bone proteins; 4. Express site-specific mutant proteins and domains of wild-type proteins to enhance likelihood of crystal formation for subsequent solution of structure.
Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil
2013-01-01
Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.
Restricted N-glycan Conformational Space in the PDB and Its Implication in Glycan Structure Modeling
Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil
2013-01-01
Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures. PMID:23516343
Structural analysis of a set of proteins resulting from a bacterial genomics project.
Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R
2005-09-01
The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.
Probing Protein Structure and Folding in the Gas Phase by Electron Capture Dissociation
NASA Astrophysics Data System (ADS)
Schennach, Moritz; Breuker, Kathrin
2015-07-01
The established methods for the study of atom-detailed protein structure in the condensed phases, X-ray crystallography and nuclear magnetic resonance spectroscopy, have recently been complemented by new techniques by which nearly or fully desolvated protein structures are probed in gas-phase experiments. Electron capture dissociation (ECD) is unique among these as it provides residue-specific, although indirect, structural information. In this Critical Insight article, we discuss the development of ECD for the structural probing of gaseous protein ions, its potential, and limitations.
From non-random molecular structure to life and mind
NASA Technical Reports Server (NTRS)
Fox, S. W.
1989-01-01
The evolutionary hierarchy molecular structure-->macromolecular structure-->protobiological structure-->biological structure-->biological functions has been traced by experiments. The sequence always moves through protein. Extension of the experiments traces the formation of nucleic acids instructed by proteins. The proteins themselves were, in this picture, instructed by the self-sequencing of precursor amino acids. While the sequence indicated explains the thread of the emergence of life, protein in cellular membrane also provides the only known material basis for the emergence of mind in the context of emergence of life.
Solid state NMR: The essential technology for helical membrane protein structural characterization
Cross, Timothy A.; Ekanayake, Vindana; Paulino, Joana; Wright, Anna
2014-01-01
NMR spectroscopy of helical membrane proteins has been very challenging on multiple fronts. The expression and purification of these proteins while maintaining functionality has consumed countless graduate student hours. Sample preparations have depended on whether solution or solid-state NMR spectroscopy was to be performed – neither have been easy. In recent years it has become increasingly apparent that membrane mimic environments influence the structural result. Indeed, in these recent years we have rediscovered that Nobel laureate, Christian Anfinsen, did not say that protein structure was exclusively dictated by the amino acid sequence, but rather by the sequence in a given environment (Anfinsen, 1973) [106]. The environment matters, molecular interactions with the membrane environment are significant and many examples of distorted, non-native membrane protein structures have recently been documented in the literature. However, solid-state NMR structures of helical membrane proteins in proteoliposomes and bilayers are proving to be native structures that permit a high resolution characterization of their functional states. Indeed, solid-state NMR is uniquely able to characterize helical membrane protein structures in lipid environments without detergents. Recent progress in expression, purification, reconstitution, sample preparation and in the solid-state NMR spectroscopy of both oriented samples and magic angle spinning samples has demonstrated that helical membrane protein structures can be achieved in a timely fashion. Indeed, this is a spectacular opportunity for the NMR community to have a major impact on biomedical research through the solid-state NMR spectroscopy of these proteins. PMID:24412099
Solid state NMR: The essential technology for helical membrane protein structural characterization
NASA Astrophysics Data System (ADS)
Cross, Timothy A.; Ekanayake, Vindana; Paulino, Joana; Wright, Anna
2014-02-01
NMR spectroscopy of helical membrane proteins has been very challenging on multiple fronts. The expression and purification of these proteins while maintaining functionality has consumed countless graduate student hours. Sample preparations have depended on whether solution or solid-state NMR spectroscopy was to be performed - neither have been easy. In recent years it has become increasingly apparent that membrane mimic environments influence the structural result. Indeed, in these recent years we have rediscovered that Nobel laureate, Christian Anfinsen, did not say that protein structure was exclusively dictated by the amino acid sequence, but rather by the sequence in a given environment (Anfinsen, 1973) [106]. The environment matters, molecular interactions with the membrane environment are significant and many examples of distorted, non-native membrane protein structures have recently been documented in the literature. However, solid-state NMR structures of helical membrane proteins in proteoliposomes and bilayers are proving to be native structures that permit a high resolution characterization of their functional states. Indeed, solid-state NMR is uniquely able to characterize helical membrane protein structures in lipid environments without detergents. Recent progress in expression, purification, reconstitution, sample preparation and in the solid-state NMR spectroscopy of both oriented samples and magic angle spinning samples has demonstrated that helical membrane protein structures can be achieved in a timely fashion. Indeed, this is a spectacular opportunity for the NMR community to have a major impact on biomedical research through the solid-state NMR spectroscopy of these proteins.
Wustman, Brandon A; Santos, Rudolpho; Zhang, Bo; Evans, John Spencer
2002-12-05
Fracture resistance in biomineralized structures has been linked to the presence of proteins, some of which possess sequences that are associated with elastic behavior. One such protein superfamily, the Pro,Gly-rich sea urchin intracrystalline spicule matrix proteins, form protein-protein supramolecular assemblies that modify the microstructure and fracture-resistant properties of the calcium carbonate mineral phase within embryonic sea urchin spicules and adult sea urchin spines. In this report, we detail the identification of a repetitive keratin-like "glycine-loop"- or coil-like structure within the 34-AA (AA: amino acid) N-terminal domain, (PGMG)(8)PG, of the spicule matrix protein, PM27. The identification of this repetitive structural motif was accomplished using two capped model peptides: a 9-AA sequence, GPGMGPGMG, and a 34-AA peptide representing the entire motif. Using CD, NMR spectrometry, and molecular dynamics simulated annealing/minimization simulations, we have determined that the 9-AA model peptide adopts a loop-like structure at pH 7.4. The structure of the 34-AA polypeptide resembles a coil structure consisting of repeating loop motifs that do not exhibit long-range ordering. Given that loop structures have been associated with protein elastic behavior and protein motion, it is plausible that the 34-AA Pro,Gly,Met repeat sequence motif in PM27 represents a putative elastic or mobile domain. Copyright 2002 Wiley Periodicals, Inc.
The value of protein structure classification information—Surveying the scientific literature
Fox, Naomi K.; Brenner, Steven E.
2015-01-01
ABSTRACT The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non‐SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015; 83:2025–2038. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26313554
Dewhurst, Henry M; Choudhury, Shilpa; Torres, Matthew P
2015-08-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)--a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits--conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit-N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Protein structure recognition: From eigenvector analysis to structural threading method
NASA Astrophysics Data System (ADS)
Cao, Haibo
In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.
Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L
2014-01-21
Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi
2016-02-01
The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Structural alignment of protein descriptors - a combinatorial model.
Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek
2016-09-17
Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).
Construction of ontology augmented networks for protein complex prediction.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian
2013-01-01
Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
Sixty-five years of the long march in protein secondary structure prediction: the final stretch?
Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi
2018-01-01
Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746
USDA-ARS?s Scientific Manuscript database
Potato leafroll virus (PLRV) produces a readthrough protein (RTP) via translational readthrough of the coat protein amber stop codon. The RTP functions as a structural component of the virion and as a non-incorporated protein in concert with numerous insect and plant proteins to regulate virus movem...
Tandem Repeat Proteins Inspired By Squid Ring Teeth
NASA Astrophysics Data System (ADS)
Pena-Francesch, Abdon
Proteins are large biomolecules consisting of long chains of amino acids that hierarchically assemble into complex structures, and provide a variety of building blocks for biological materials. The repetition of structural building blocks is a natural evolutionary strategy for increasing the complexity and stability of protein structures. However, the relationship between amino acid sequence, structure, and material properties of protein systems remains unclear due to the lack of control over the protein sequence and the intricacies of the assembly process. In order to investigate the repetition of protein building blocks, a recently discovered protein from squids is examined as an ideal protein system. Squid ring teeth are predatory appendages located inside the suction cups that provide a strong grasp of prey, and are solely composed of a group of proteins with tandem repetition of building blocks. The objective of this thesis is the understanding of sequence, structure and property relationship in repetitive protein materials inspired in squid ring teeth for the first time. Specifically, this work focuses on squid-inspired structural proteins with tandem repeat units in their sequence (i.e., repetition of alternating building blocks) that are physically cross-linked via beta-sheet structures. The research work presented here tests the hypothesis that, in these systems, increasing the number of building blocks in the polypeptide chain decreases the protein network defects and improves the material properties. Hence, the sequence, nanostructure, and properties (thermal, mechanical, and conducting) of tandem repeat squid-inspired protein materials are examined. Spectroscopic structural analysis, advanced materials characterization, and entropic elasticity theory are combined to elucidate the structure and material properties of these repetitive proteins. This approach is applied not only to native squid proteins but also to squid-inspired synthetic polypeptides that allow for a fine control of the sequence and network morphology. The results provided in this work establish a clear dependence between the repetitive building blocks, the network morphology, and the properties of squid-inspired repetitive protein materials. Increasing the number of tandem repeat units in SRT-inspired proteins led to more effective protein networks with superior properties. Through increasing tandem repetition and optimization of network morphology, highly efficient protein materials capable of withstanding deformations up to 400% of their original length, with MPa-GPa modulus, high energy absorption (50 MJ m-3), peak proton conductivity of 3.7 mS cm-1 (at pH 7, highest reported to date for biological materials), and peak thermal conductivity of 1.4 W m-1 K -1 (which exceeds that of most polymer materials) were developed. These findings introduce new design rules in the engineering of proteins based on tandem repetition and morphology control, and provide a novel framework for tailoring and optimizing the properties of protein-based materials.
Keates, Tracy; Cooper, Christopher D O; Savitsky, Pavel; Allerston, Charles K; Phillips, Claire; Hammarström, Martin; Daga, Neha; Berridge, Georgina; Mahajan, Pravin; Burgess-Brown, Nicola A; Müller, Susanne; Gräslund, Susanne; Gileadi, Opher
2012-06-15
The generation of affinity reagents to large numbers of human proteins depends on the ability to express the target proteins as high-quality antigens. The Structural Genomics Consortium (SGC) focuses on the production and structure determination of human proteins. In a 7-year period, the SGC has deposited crystal structures of >800 human protein domains, and has additionally expressed and purified a similar number of protein domains that have not yet been crystallised. The targets include a diversity of protein domains, with an attempt to provide high coverage of protein families. The family approach provides an excellent basis for characterising the selectivity of affinity reagents. We present a summary of the approaches used to generate purified human proteins or protein domains, a test case demonstrating the ability to rapidly generate new proteins, and an optimisation study on the modification of >70 proteins by biotinylation in vivo. These results provide a unique synergy between large-scale structural projects and the recent efforts to produce a wide coverage of affinity reagents to the human proteome. Copyright © 2011 Elsevier B.V. All rights reserved.
Keates, Tracy; Cooper, Christopher D.O.; Savitsky, Pavel; Allerston, Charles K.; Phillips, Claire; Hammarström, Martin; Daga, Neha; Berridge, Georgina; Mahajan, Pravin; Burgess-Brown, Nicola A.; Müller, Susanne; Gräslund, Susanne; Gileadi, Opher
2012-01-01
The generation of affinity reagents to large numbers of human proteins depends on the ability to express the target proteins as high-quality antigens. The Structural Genomics Consortium (SGC) focuses on the production and structure determination of human proteins. In a 7-year period, the SGC has deposited crystal structures of >800 human protein domains, and has additionally expressed and purified a similar number of protein domains that have not yet been crystallised. The targets include a diversity of protein domains, with an attempt to provide high coverage of protein families. The family approach provides an excellent basis for characterising the selectivity of affinity reagents. We present a summary of the approaches used to generate purified human proteins or protein domains, a test case demonstrating the ability to rapidly generate new proteins, and an optimisation study on the modification of >70 proteins by biotinylation in vivo. These results provide a unique synergy between large-scale structural projects and the recent efforts to produce a wide coverage of affinity reagents to the human proteome. PMID:22027370
Matityahu, Avi; Onn, Itay
2018-02-01
The higher-order organization of chromosomes ensures their stability and functionality. However, the molecular mechanism by which higher order structure is established is poorly understood. Dissecting the activity of the relevant proteins provides information essential for achieving a comprehensive understanding of chromosome structure. Proteins of the structural maintenance of chromosome (SMC) family of ATPases are the core of evolutionary conserved complexes. SMC complexes are involved in regulating genome dynamics and in maintaining genome stability. The structure of all SMC proteins resembles an elongated rod that contains a central coiled-coil domain, a common protein structural motif in which two α-helices twist together. In recent years, the imperative role of the coiled-coil domain to SMC protein activity and regulation has become evident. Here, we discuss recent advances in the function of the SMC coiled coils. We describe the structure of the coiled-coil domain of SMC proteins, modifications and interactions that are mediated by it. Furthermore, we assess the role of the coiled-coil domain in conformational switches of SMC proteins, and in determining the architecture of the SMC dimer. Finally, we review the interplay between mutations in the coiled-coil domain and human disorders. We suggest that distinctive properties of coiled coils of different SMC proteins contribute to their distinct functions. The discussion clarifies the mechanisms underlying the activity of SMC proteins, and advocates future studies to elucidate the function of the SMC coiled coil domain.
Krissinel, E; Henrick, K
2004-12-01
The present paper describes the SSM algorithm of protein structure comparison in three dimensions, which includes an original procedure of matching graphs built on the protein's secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone Calpha atoms. The SSM results are compared with those obtained from other protein comparison servers, and the advantages and disadvantages of different scores that are used for structure recognition are discussed. A new score, balancing the r.m.s.d. and alignment length Nalign, is proposed. It is found that different servers agree reasonably well on the new score, while showing considerable differences in r.m.s.d. and Nalign.
Brodie, Nicholas I.; Popov, Konstantin I.; Petrotchenko, Evgeniy V.; Dokholyan, Nikolay V.; Borchers, Christoph H.
2017-01-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein—models for α helix–rich and β sheet–rich proteins, respectively—and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures. PMID:28695211
Schindler, Christina E M; de Vries, Sjoerd J; Zacharias, Martin
2015-02-01
Protein-protein interactions are abundant in the cell but to date structural data for a large number of complexes is lacking. Computational docking methods can complement experiments by providing structural models of complexes based on structures of the individual partners. A major caveat for docking success is accounting for protein flexibility. Especially, interface residues undergo significant conformational changes upon binding. This limits the performance of docking methods that keep partner structures rigid or allow limited flexibility. A new docking refinement approach, iATTRACT, has been developed which combines simultaneous full interface flexibility and rigid body optimizations during docking energy minimization. It employs an atomistic molecular mechanics force field for intermolecular interface interactions and a structure-based force field for intramolecular contributions. The approach was systematically evaluated on a large protein-protein docking benchmark, starting from an enriched decoy set of rigidly docked protein-protein complexes deviating by up to 15 Å from the native structure at the interface. Large improvements in sampling and slight but significant improvements in scoring/discrimination of near native docking solutions were observed. Complexes with initial deviations at the interface of up to 5.5 Å were refined to significantly better agreement with the native structure. Improvements in the fraction of native contacts were especially favorable, yielding increases of up to 70%. © 2014 Wiley Periodicals, Inc.
Villanueva, Josep; Villegas, Virtudes; Querol, Enrique; Avilés, Francesc X; Serrano, Luis
2002-09-01
In the post-genomic era, several projects focused on the massive experimental resolution of the three-dimensional structures of all the proteins of different organisms have been initiated. Simultaneously, significant progress has been made in the ab initio prediction of protein three-dimensional structure. One of the keys to the success of such a prediction is the use of local information (i.e. secondary structure). Here we describe a new limited proteolysis methodology, based on the use of unspecific exoproteases coupled with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), to map quickly secondary structure elements of a protein from both ends, the N- and C-termini. We show that the proteolytic patterns (mass spectra series) obtained can be interpreted in the light of the conformation and local stability of the analyzed proteins, a direct correlation being observed between the predicted and the experimentally derived protein secondary structure. Further, this methodology can be easily applied to check rapidly the folding state of a protein and characterize mutational effects on protein conformation and stability. Moreover, given global stability information, this methodology allows one to locate the protein regions of increased or decreased conformational stability. All of this can be done with a small fraction of the amount of protein required by most of the other methods for conformational analysis. Thus limited exoproteolysis, together with MALDI-TOF MS, can be a useful tool to achieve quickly the elucidation of protein structure and stability. Copyright 2002 John Wiley & Sons, Ltd.
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
A topological approach for protein classification
Cang, Zixuan; Mu, Lin; Wu, Kedi; ...
2015-11-04
Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
The interface of protein structure, protein biophysics, and molecular evolution
Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon
2012-01-01
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593
Assessment of Protein Side-Chain Conformation Prediction Methods in Different Residue Environments
Peterson, Lenna X.; Kang, Xuejiao; Kihara, Daisuke
2016-01-01
Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. PMID:24619909
Topological properties of complex networks in protein structures
NASA Astrophysics Data System (ADS)
Kim, Kyungsik; Jung, Jae-Won; Min, Seungsik
2014-03-01
We study topological properties of networks in structural classification of proteins. We model the native-state protein structure as a network made of its constituent amino-acids and their interactions. We treat four structural classes of proteins composed predominantly of α helices and β sheets and consider several proteins from each of these classes whose sizes range from amino acids of the Protein Data Bank. Particularly, we simulate and analyze the network metrics such as the mean degree, the probability distribution of degree, the clustering coefficient, the characteristic path length, the local efficiency, and the cost. This work was supported by the KMAR and DP under Grant WISE project (153-3100-3133-302-350).
What determines the spectrum of protein native state structures?
Lezon, Timothy R; Banavar, Jayanth R; Lesk, Arthur M; Maritan, Amos
2006-05-01
We present a brief summary of the key factors underlying protein structure, as developed in the investigations of Pauling, Ramachandran, and Rose. We then outline a simplified physical model of proteins that focuses on geometry and symmetry. Although this model superficially appears unrelated to the detailed chemical descriptions commonly applied to proteins, we show that it captures the essential elements of the chemistry and provides a unified framework for understanding the common characteristics of folded proteins. We suggest that the spectrum of protein native state structures is determined by geometry and symmetry and the role of the sequence is to choose its native state structure from this predetermined menu. 2006 Wiley-Liss, Inc.
Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui
2016-08-01
The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller's dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001) and solubility (p = 0.002). These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.
Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui
2016-01-01
The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller’s dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001) and solubility (p = 0.002). These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins. PMID:26954145
The SARS coronavirus nucleocapsid protein--forms and functions.
Chang, Chung-ke; Hou, Ming-Hon; Chang, Chi-Fon; Hsiao, Chwan-Deng; Huang, Tai-huang
2014-03-01
The nucleocapsid phosphoprotein of the severe acute respiratory syndrome coronavirus (SARS-CoV N protein) packages the viral genome into a helical ribonucleocapsid (RNP) and plays a fundamental role during viral self-assembly. It is a protein with multifarious activities. In this article we will review our current understanding of the N protein structure and its interaction with nucleic acid. Highlights of the progresses include uncovering the modular organization, determining the structures of the structural domains, realizing the roles of protein disorder in protein-protein and protein-nucleic acid interactions, and visualizing the ribonucleoprotein (RNP) structure inside the virions. It was also demonstrated that N-protein binds to nucleic acid at multiple sites with a coupled-allostery manner. We propose a SARS-CoV RNP model that conforms to existing data and bears resemblance to the existing RNP structures of RNA viruses. The model highlights the critical role of modular organization and intrinsic disorder of the N protein in the formation and functions of the dynamic RNP capsid in RNA viruses. This paper forms part of a symposium in Antiviral Research on "From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses." Copyright © 2014 Elsevier B.V. All rights reserved.
Kavianpour, Hamidreza; Vasighi, Mahdi
2017-02-01
Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Applying graph theory to protein structures: an atlas of coiled coils.
Heal, Jack W; Bartlett, Gail J; Wood, Christopher W; Thomson, Andrew R; Woolfson, Derek N
2018-05-02
To understand protein structure, folding and function fully and to design proteins de novo reliably, we must learn from natural protein structures that have been characterised experimentally. The number of protein structures available is large and growing exponentially, which makes this task challenging. Indeed, computational resources are becoming increasingly important for classifying and analysing this resource. Here, we use tools from graph theory to define an atlas classification scheme for automatically categorising certain protein substructures. Focusing on the α-helical coiled coils, which are ubiquitous protein-structure and protein-protein interaction motifs, we present a suite of computational resources designed for analysing these assemblies. iSOCKET enables interactive analysis of side-chain packing within proteins to identify coiled coils automatically and with considerable user control. Applying a graph theory-based atlas classification scheme to structures identified by iSOCKET gives the Atlas of Coiled Coils, a fully automated, updated overview of extant coiled coils. The utility of this approach is illustrated with the first formal classification of an emerging subclass of coiled coils called α-helical barrels. Furthermore, in the Atlas, the known coiled-coil universe is presented alongside a partial enumeration of the 'dark matter' of coiled-coil structures; i.e., those coiled-coil architectures that are theoretically possible but have not been observed to date, and thus present defined targets for protein design. iSOCKET is available as part of the open-source GitHub repository associated with this work (https://github.com/woolfson-group/isocket). This repository also contains all the data generated when classifying the protein graphs. The Atlas of Coiled Coils is available at: http://coiledcoils.chm.bris.ac.uk/atlas/app.
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
Automated crystallographic system for high-throughput protein structure determination.
Brunzelle, Joseph S; Shafaee, Padram; Yang, Xiaojing; Weigand, Steve; Ren, Zhong; Anderson, Wayne F
2003-07-01
High-throughput structural genomic efforts require software that is highly automated, distributive and requires minimal user intervention to determine protein structures. Preliminary experiments were set up to test whether automated scripts could utilize a minimum set of input parameters and produce a set of initial protein coordinates. From this starting point, a highly distributive system was developed that could determine macromolecular structures at a high throughput rate, warehouse and harvest the associated data. The system uses a web interface to obtain input data and display results. It utilizes a relational database to store the initial data needed to start the structure-determination process as well as generated data. A distributive program interface administers the crystallographic programs which determine protein structures. Using a test set of 19 protein targets, 79% were determined automatically.
A Novel Method for Sampling Alpha-Helical Protein Backbones
DOE R&D Accomplishments Database
Fain, Boris; Levitt, Michael
2001-01-01
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible 3-D arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case the method finds significant numbers of conformations close to the native structure. In addition we assign coordinates to all atoms for 4 of the 25 proteins. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of structures (0.02%--82%) within 6A of the native structure. The method's speed and efficiency make it a valuable contribution towards the goal of predicting protein structure.
Efficient Relaxation of Protein-Protein Interfaces by Discrete Molecular Dynamics Simulations.
Emperador, Agusti; Solernou, Albert; Sfriso, Pedro; Pons, Carles; Gelpi, Josep Lluis; Fernandez-Recio, Juan; Orozco, Modesto
2013-02-12
Protein-protein interactions are responsible for the transfer of information inside the cell and represent one of the most interesting research fields in structural biology. Unfortunately, after decades of intense research, experimental approaches still have difficulties in providing 3D structures for the hundreds of thousands of interactions formed between the different proteins in a living organism. The use of theoretical approaches like docking aims to complement experimental efforts to represent the structure of the protein interactome. However, we cannot ignore that current methods have limitations due to problems of sampling of the protein-protein conformational space and the lack of accuracy of available force fields. Cases that are especially difficult for prediction are those in which complex formation implies a non-negligible change in the conformation of the interacting proteins, i.e., those cases where protein flexibility plays a key role in protein-protein docking. In this work, we present a new approach to treat flexibility in docking by global structural relaxation based on ultrafast discrete molecular dynamics. On a standard benchmark of protein complexes, the method provides a general improvement over the results obtained by rigid docking. The method is especially efficient in cases with large conformational changes upon binding, in which structure relaxation with discrete molecular dynamics leads to a predictive success rate double that obtained with state-of-the-art rigid-body docking.
Buried and accessible surface area control intrinsic protein flexibility.
Marsh, Joseph A
2013-09-09
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the relative solvent-accessible surface area (Arel), therefore shows excellent agreement with independent measures of global protein flexibility derived from various experimental and computational methods. Application of Arel on a large scale demonstrates its utility by revealing unique sequence and structural properties associated with intrinsic flexibility. In particular, flexibility as measured by Arel shows little correspondence with intrinsic disorder, but instead tends to be associated with multiple domains and increased α-helical structure. Furthermore, the apparent flexibility of monomeric proteins is found to be useful for identifying quaternary-structure errors in published crystal structures. There is also a strong tendency for the crystal structures of more flexible proteins to be solved to lower resolutions. Finally, local solvent accessibility is shown to be a primary determinant of local residue flexibility. Overall, this work provides both fundamental mechanistic insight into the origin of protein flexibility and a simple, practical method for predicting flexibility from protein structures. © 2013 Elsevier Ltd. All rights reserved.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.
Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin
2017-01-01
Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-02-06
Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-09-01
Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Cooperative structural transitions in amyloid-like aggregation
NASA Astrophysics Data System (ADS)
Steckmann, Timothy; Bhandari, Yuba R.; Chapagain, Prem P.; Gerstman, Bernard S.
2017-04-01
Amyloid fibril aggregation is associated with several horrific diseases such as Alzheimer's, Creutzfeld-Jacob, diabetes, Parkinson's, and others. Although proteins that undergo aggregation vary widely in their primary structure, they all produce a cross-β motif with the proteins in β-strand conformations perpendicular to the fibril axis. The process of amyloid aggregation involves forming myriad different metastable intermediate aggregates. To better understand the molecular basis of the protein structural transitions and aggregation, we report on molecular dynamics (MD) computational studies on the formation of amyloid protofibrillar structures in the small model protein ccβ, which undergoes many of the structural transitions of the larger, naturally occurring amyloid forming proteins. Two different structural transition processes involving hydrogen bonds are observed for aggregation into fibrils: the breaking of intrachain hydrogen bonds to allow β-hairpin proteins to straighten, and the subsequent formation of interchain H-bonds during aggregation into amyloid fibrils. For our MD simulations, we found that the temperature dependence of these two different structural transition processes results in the existence of a temperature window that the ccβ protein experiences during the process of forming protofibrillar structures. This temperature dependence allows us to investigate the dynamics on a molecular level. We report on the thermodynamics and cooperativity of the transformations. The structural transitions that occurred in a specific temperature window for ccβ in our investigations may also occur in other amyloid forming proteins but with biochemical parameters controlling the dynamics rather than temperature.
Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij
2012-03-01
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.
Conservation of protein structure over four billion years.
Ingles-Prieto, Alvaro; Ibarra-Molero, Beatriz; Delgado-Delgado, Asuncion; Perez-Jimenez, Raul; Fernandez, Julio M; Gaucher, Eric A; Sanchez-Ruiz, Jose M; Gavira, Jose A
2013-09-03
Little is known about the evolution of protein structures and the degree of protein structure conservation over planetary time scales. Here, we report the X-ray crystal structures of seven laboratory resurrections of Precambrian thioredoxins dating up to approximately four billion years ago. Despite considerable sequence differences compared with extant enzymes, the ancestral proteins display the canonical thioredoxin fold, whereas only small structural changes have occurred over four billion years. This remarkable degree of structure conservation since a time near the last common ancestor of life supports a punctuated-equilibrium model of structure evolution in which the generation of new folds occurs over comparatively short periods and is followed by long periods of structural stasis. Copyright © 2013 Elsevier Ltd. All rights reserved.
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
2012-01-01
Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.
Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl
2012-07-13
Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.
Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran
2016-08-26
Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.
Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude
2011-06-20
One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
2011-01-01
Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang,X.; Wu, J.; Sivaraman, J.
2007-01-01
White spot syndrome virus (WSSV) is a virulent pathogen known to infect various crustaceans. It has bacilliform morphology with a tail-like appendage at one end. The envelope consists of four major proteins. Envelope structural proteins play a crucial role in viral infection and are believed to be the first molecules to interact with the host. Here, we report the localization and crystal structure of major envelope proteins VP26 and VP28 from WSSV at resolutions of 2.2 and 2.0 {angstrom}, respectively. These two proteins alone account for approximately 60% of the envelope, and their structures represent the first two structural envelopemore » proteins of WSSV. Structural comparisons among VP26, VP28, and other viral proteins reveal an evolutionary relationship between WSSV envelope proteins and structural proteins from other viruses. Both proteins adopt {beta}-barrel architecture with a protruding N-terminal region. We have investigated the localization of VP26 and VP28 using immunoelectron microscopy. This study suggests that VP26 and VP28 are located on the outer surface of the virus and are observed as a surface protrusion in the WSSV envelope, and this is the first convincing observation for VP26. Based on our studies combined with the literature, we speculate that the predicted N-terminal transmembrane region of VP26 and VP28 may anchor on the viral envelope membrane, making the core {beta}-barrel protrude outside the envelope, possibly to interact with the host receptor or to fuse with the host cell membrane for effective transfer of the viral infection. Furthermore, it is tempting to extend this host interaction mode to other structural viral proteins of similar structures. Our finding has the potential to extend further toward drug and vaccine development against WSSV.« less
Perez, Romel B; Tischer, Alexander; Auton, Matthew; Whitten, Steven T
2014-12-01
Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins (IDPs), mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline (PRO) and alanine (ALA) to glycine (GLY) substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (R(h)) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the GLY substitutions decreased polyproline II (PP(II)) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in R(h) were not associated with folding. The experiments showed that changes in local PP(II) structure cause changes in R(h) that are variable and that depend on the intrinsic chain propensities of PRO and ALA residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed PRO and alanine effects on the structures of IDPs. © 2014 Wiley Periodicals, Inc.
A 'periodic table' for protein structures.
Taylor, William R
2002-04-11
Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
Ou, Horng D.; Deerinck, Thomas J.; Bushong, Eric; Ellisman, Mark H.; O’Shea, Clodagh C.
2015-01-01
Structural studies of viral proteins most often use high-resolution techniques such as X-ray crystallography, nuclear magnetic resonance, single particle negative stain, or cryo-electron microscopy (EM) to reveal atomic interactions of soluble, homogeneous viral proteins or viral protein complexes. Once viral proteins or complexes are separated from their host’s cellular environment, their natural in-situ structure and details of how they interact with other cellular components may be lost. EM has been an invaluable tool in virology since its introduction in the late 1940’s and subsequent application to cells in the 1950’s. EM studies have expanded our knowledge of viral entry, viral replication, alteration of cellular components, and viral lysis. Most of these early studies were focused on conspicuous morphological cellular changes, because classic EM metal stains were designed to highlight classes of cellular structures rather than specific molecular structures. Much later, to identify viral proteins inducing specific structural configurations at the cellular level, immunostaining with a primary antibody followed by colloidal gold secondary antibody was employed to mark the location of specific viral proteins. This technique can suffer from artifacts in cellular ultrastructure due to compromises required to provide access to the immuno-reagents. Immunolocalization methods also require the generation of highly specific antibodies, which may not be available for every viral protein. Here we discuss new methods to visualize viral proteins and structures at high resolutions in-situ using correlated light and electron microscopy (CLEM). We discuss the use of genetically encoded protein fusions that oxidize diaminobenzidine (DAB) into an osmiophilic polymer that can be visualized by EM. Detailed protocols for applying the genetically encoded photo-oxidizing protein MiniSOG to a viral protein, photo-oxidation of the fusion protein to yield DAB polymer staining, and preparation of photo-oxidized samples for TEM and serial block-face scanning EM (SBEM) for large-scale volume EM data acquisition are also presented. As an example, we discuss the recent multi-scale analysis of Adenoviral protein E4-ORF3 that reveals a new type of multi-functional polymer that disrupts multiple cellular proteins. This new capability to visualize unambiguously specific viral protein structures at high resolutions in the native cellular environment is revealing new insights into how they usurp host proteins and functions to drive pathological viral replication. PMID:26066760
Ou, Horng D; Deerinck, Thomas J; Bushong, Eric; Ellisman, Mark H; O'Shea, Clodagh C
2015-11-15
Structural studies of viral proteins most often use high-resolution techniques such as X-ray crystallography, nuclear magnetic resonance, single particle negative stain, or cryo-electron microscopy (EM) to reveal atomic interactions of soluble, homogeneous viral proteins or viral protein complexes. Once viral proteins or complexes are separated from their host's cellular environment, their natural in situ structure and details of how they interact with other cellular components may be lost. EM has been an invaluable tool in virology since its introduction in the late 1940's and subsequent application to cells in the 1950's. EM studies have expanded our knowledge of viral entry, viral replication, alteration of cellular components, and viral lysis. Most of these early studies were focused on conspicuous morphological cellular changes, because classic EM metal stains were designed to highlight classes of cellular structures rather than specific molecular structures. Much later, to identify viral proteins inducing specific structural configurations at the cellular level, immunostaining with a primary antibody followed by colloidal gold secondary antibody was employed to mark the location of specific viral proteins. This technique can suffer from artifacts in cellular ultrastructure due to compromises required to provide access to the immuno-reagents. Immunolocalization methods also require the generation of highly specific antibodies, which may not be available for every viral protein. Here we discuss new methods to visualize viral proteins and structures at high resolutions in situ using correlated light and electron microscopy (CLEM). We discuss the use of genetically encoded protein fusions that oxidize diaminobenzidine (DAB) into an osmiophilic polymer that can be visualized by EM. Detailed protocols for applying the genetically encoded photo-oxidizing protein MiniSOG to a viral protein, photo-oxidation of the fusion protein to yield DAB polymer staining, and preparation of photo-oxidized samples for TEM and serial block-face scanning EM (SBEM) for large-scale volume EM data acquisition are also presented. As an example, we discuss the recent multi-scale analysis of Adenoviral protein E4-ORF3 that reveals a new type of multi-functional polymer that disrupts multiple cellular proteins. This new capability to visualize unambiguously specific viral protein structures at high resolutions in the native cellular environment is revealing new insights into how they usurp host proteins and functions to drive pathological viral replication. Copyright © 2015 Elsevier Inc. All rights reserved.
Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting
Wang, Liwen; Chance, Mark R.
2011-01-01
Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described. PMID:21770468
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.
Tian, Pengfei; Best, Robert B
2017-10-17
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
From laptop to benchtop to bedside: Structure-based Drug Design on Protein Targets
Chen, Lu; Morrow, John K.; Tran, Hoang T.; Phatak, Sharangdhar S.; Du-Cuny, Lei; Zhang, Shuxing
2013-01-01
As an important aspect of computer-aided drug design, structure-based drug design brought a new horizon to pharmaceutical development. This in silico method permeates all aspects of drug discovery today, including lead identification, lead optimization, ADMET prediction and drug repurposing. Structure-based drug design has resulted in fruitful successes drug discovery targeting protein-ligand and protein-protein interactions. Meanwhile, challenges, noted by low accuracy and combinatoric issues, may also cause failures. In this review, state-of-the-art techniques for protein modeling (e.g. structure prediction, modeling protein flexibility, etc.), hit identification/optimization (e.g. molecular docking, focused library design, fragment-based design, molecular dynamic, etc.), and polypharmacology design will be discussed. We will explore how structure-based techniques can facilitate the drug discovery process and interplay with other experimental approaches. PMID:22316152
Using linear algebra for protein structural comparison and classification
2009-01-01
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in. PMID:21637532
Using linear algebra for protein structural comparison and classification.
Gomide, Janaína; Melo-Minardi, Raquel; Dos Santos, Marcos Augusto; Neshich, Goran; Meira, Wagner; Lopes, Júlio César; Santoro, Marcelo
2009-07-01
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.
Yesselman, Joseph D; Horowitz, Scott; Brooks, Charles L; Trievel, Raymond C
2015-03-01
The propensity of backbone Cα atoms to engage in carbon-oxygen (CH · · · O) hydrogen bonding is well-appreciated in protein structure, but side chain CH · · · O hydrogen bonding remains largely uncharacterized. The extent to which side chain methyl groups in proteins participate in CH · · · O hydrogen bonding is examined through a survey of neutron crystal structures, quantum chemistry calculations, and molecular dynamics simulations. Using these approaches, methyl groups were observed to form stabilizing CH · · · O hydrogen bonds within protein structure that are maintained through protein dynamics and participate in correlated motion. Collectively, these findings illustrate that side chain methyl CH · · · O hydrogen bonding contributes to the energetics of protein structure and folding. © 2014 Wiley Periodicals, Inc.
Christensen, Anders S.; Linnet, Troels E.; Borg, Mikael; Boomsma, Wouter; Lindorff-Larsen, Kresten; Hamelryck, Thomas; Jensen, Jan H.
2013-01-01
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts - sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond (h3 JNC') spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding. PMID:24391900
RNA chaperoning and intrinsic disorder in the core proteins of Flaviviridae.
Ivanyi-Nagy, Roland; Lavergne, Jean-Pierre; Gabus, Caroline; Ficheux, Damien; Darlix, Jean-Luc
2008-02-01
RNA chaperone proteins are essential partners of RNA in living organisms and viruses. They are thought to assist in the correct folding and structural rearrangements of RNA molecules by resolving misfolded RNA species in an ATP-independent manner. RNA chaperoning is probably an entropy-driven process, mediated by the coupled binding and folding of intrinsically disordered protein regions and the kinetically trapped RNA. Previously, we have shown that the core protein of hepatitis C virus (HCV) is a potent RNA chaperone that can drive profound structural modifications of HCV RNA in vitro. We now examined the RNA chaperone activity and the disordered nature of core proteins from different Flaviviridae genera, namely that of HCV, GBV-B (GB virus B), WNV (West Nile virus) and BVDV (bovine viral diarrhoea virus). Despite low-sequence similarities, all four proteins demonstrated general nucleic acid annealing and RNA chaperone activities. Furthermore, heat resistance of core proteins, as well as far-UV circular dichroism spectroscopy suggested that a well-defined 3D protein structure is not necessary for core-induced RNA structural rearrangements. These data provide evidence that RNA chaperoning-possibly mediated by intrinsically disordered protein segments-is conserved in Flaviviridae core proteins. Thus, besides nucleocapsid formation, core proteins may function in RNA structural rearrangements taking place during virus replication.
RNA chaperoning and intrinsic disorder in the core proteins of Flaviviridae
Ivanyi-Nagy, Roland; Lavergne, Jean-Pierre; Gabus, Caroline; Ficheux, Damien; Darlix, Jean-Luc
2008-01-01
RNA chaperone proteins are essential partners of RNA in living organisms and viruses. They are thought to assist in the correct folding and structural rearrangements of RNA molecules by resolving misfolded RNA species in an ATP-independent manner. RNA chaperoning is probably an entropy-driven process, mediated by the coupled binding and folding of intrinsically disordered protein regions and the kinetically trapped RNA. Previously, we have shown that the core protein of hepatitis C virus (HCV) is a potent RNA chaperone that can drive profound structural modifications of HCV RNA in vitro. We now examined the RNA chaperone activity and the disordered nature of core proteins from different Flaviviridae genera, namely that of HCV, GBV-B (GB virus B), WNV (West Nile virus) and BVDV (bovine viral diarrhoea virus). Despite low-sequence similarities, all four proteins demonstrated general nucleic acid annealing and RNA chaperone activities. Furthermore, heat resistance of core proteins, as well as far-UV circular dichroism spectroscopy suggested that a well-defined 3D protein structure is not necessary for core-induced RNA structural rearrangements. These data provide evidence that RNA chaperoning—possibly mediated by intrinsically disordered protein segments—is conserved in Flaviviridae core proteins. Thus, besides nucleocapsid formation, core proteins may function in RNA structural rearrangements taking place during virus replication. PMID:18033802
My 65 years in protein chemistry
Scheraga, Harold A.
2015-01-01
This is a tour of a physical chemist through 65 years of protein chemistry from the time when emphasis was placed on the determination of the size and shape of the protein molecule as a colloidal particle, with an early breakthrough by James Sumner, followed by Linus Pauling and Fred Sanger, that a protein was a real molecule, albeit a macromolecule. It deals with the recognition of the nature and importance of hydrogen bonds and hydrophobic interactions in determining the structure, properties, and biological function of proteins until the present acquisition of an understanding of the structure, thermodynamics, and folding pathways from a linear array of amino acids to a biological entity. Along the way, with a combination of experiment and theoretical interpretation, a mechanism was elucidated for the thrombin-induced conversion of fibrinogen to a fibrin blood clot and for the oxidative-folding pathways of ribonuclease A. Before the atomic structure of a protein molecule was determined by x-ray diffraction or nuclear magnetic resonance spectroscopy, experimental studies of the fundamental interactions underlying protein structure led to several distance constraints which motivated the theoretical approach to determine protein structure, and culminated in the Empirical Conformational Energy Program for Peptides (ECEPP), an all-atom force field, with which the structures of fibrous collagen-like proteins and the 46-residue globular staphylococcal protein A were determined. To undertake the study of larger globular proteins, a physics-based coarse-grained UNited-RESidue (UNRES) force field was developed, and applied to the protein-folding problem in terms of structure, thermodynamics, dynamics, and folding pathways. Initially, single-chain and, ultimately, multiple-chain proteins were examined, and the methodology was extended to protein–protein interactions and to nucleic acids and to protein–nucleic acid interactions. The ultimate results led to an understanding of a variety of biological processes underlying natural and disease phenomena. PMID:25850343
Li, Congmin; Lim, Sunghyuk; Braunewell, Karl H; Ames, James B
2016-01-01
Visinin-like protein 3 (VILIP-3) belongs to a family of Ca2+-myristoyl switch proteins that regulate signal transduction in the brain and retina. Here we analyze Ca2+ binding, characterize Ca2+-induced conformational changes, and determine the NMR structure of myristoylated VILIP-3. Three Ca2+ bind cooperatively to VILIP-3 at EF2, EF3 and EF4 (KD = 0.52 μM and Hill slope of 1.8). NMR assignments, mutagenesis and structural analysis indicate that the covalently attached myristoyl group is solvent exposed in Ca2+-bound VILIP-3, whereas Ca2+-free VILIP-3 contains a sequestered myristoyl group that interacts with protein residues (E26, Y64, V68), which are distinct from myristate contacts seen in other Ca2+-myristoyl switch proteins. The myristoyl group in VILIP-3 forms an unusual L-shaped structure that places the C14 methyl group inside a shallow protein groove, in contrast to the much deeper myristoyl binding pockets observed for recoverin, NCS-1 and GCAP1. Thus, the myristoylated VILIP-3 protein structure determined in this study is quite different from those of other known myristoyl switch proteins (recoverin, NCS-1, and GCAP1). We propose that myristoylation serves to fine tune the three-dimensional structures of neuronal calcium sensor proteins as a means of generating functional diversity.
Campeotto, Ivan; Zhang, Yong; Mladenov, Miroslav G.; Freemont, Paul S.; Gründling, Angelika
2015-01-01
Signaling nucleotides are integral parts of signal transduction systems allowing bacteria to cope with and rapidly respond to changes in the environment. The Staphylococcus aureus PII-like signal transduction protein PstA was recently identified as a cyclic diadenylate monophosphate (c-di-AMP)-binding protein. Here, we present the crystal structures of the apo- and c-di-AMP-bound PstA protein, which is trimeric in solution as well as in the crystals. The structures combined with detailed bioinformatics analysis revealed that the protein belongs to a new family of proteins with a similar core fold but with distinct features to classical PII proteins, which usually function in nitrogen metabolism pathways in bacteria. The complex structure revealed three identical c-di-AMP-binding sites per trimer with each binding site at a monomer-monomer interface. Although distinctly different from other cyclic-di-nucleotide-binding sites, as the half-binding sites are not symmetrical, the complex structure also highlighted common features for c-di-AMP-binding sites. A comparison between the apo and complex structures revealed a series of conformational changes that result in the ordering of two anti-parallel β-strands that protrude from each monomer and allowed us to propose a mechanism on how the PstA protein functions as a signaling transduction protein. PMID:25505271
Structural Integrity of Proteins under Applied Bias during Solid-State Nanopore Translocation
NASA Astrophysics Data System (ADS)
Hasan, Mohammad R.; Khanzada, Raja Raheel; Mahmood, Mohammed A. I.; Ashfaq, Adnan; Iqbal, Samir M.
2015-03-01
The translocation behavior of proteins through solid-state nanopores can be used as a new way to detect and identify proteins. The ionic current through a nanopore that flows under applied bias gets perturbed when a biomolecule traverses the Nanopore. It is important for a protein detection scheme to know of any changes in the three-dimensional structure of the molecule during the process. Here we report the data on structural integrity of protein during translocation through nanopore under different applied biases. Nanoscale Molecular Dynamic was used to establish a framework to study the changes in protein structures as these travelled across the nanopore. The analysis revealed the contributions of structural changes of protein to its ionic current signature. As a model, thrombin protein crystalline structure was imported and positioned inside a 6 nm diameter pore in a 6 nm thick silicon nitride membrane. The protein was solvated in 1 M KCl at 295 K and the system was equilibrated for 20 ns to attain its minimum energy state. The simulation was performed at different electric fields from 0 to 1 kCal/(mol.Å.e). RMSD, radial distribution function, movement of the center of mass and velocity of the protein were calculated. The results showed linear increments in the velocity and perturbations in ionic current profile with increasing electric potential. Support Acknowledged from NSF through ECCS-1201878.
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam
2018-04-30
A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.
De Novo Proteins with Life-Sustaining Functions Are Structurally Dynamic.
Murphy, Grant S; Greisman, Jack B; Hecht, Michael H
2016-01-29
Designing and producing novel proteins that fold into stable structures and provide essential biological functions are key goals in synthetic biology. In initial steps toward achieving these goals, we constructed a combinatorial library of de novo proteins designed to fold into 4-helix bundles. As described previously, screening this library for sequences that function in vivo to rescue conditionally lethal mutants of Escherichia coli (auxotrophs) yielded several de novo sequences, termed SynRescue proteins, which rescued four different E. coli auxotrophs. In an effort to understand the structural requirements necessary for auxotroph rescue, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism, size-exclusion chromatography, and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures that fluctuate between monomeric and dimeric states. These findings show that a well-ordered structure is not a prerequisite for life-sustaining functions, and suggests that dynamic structures may have been important in the early evolution of protein function. Copyright © 2015 Elsevier Ltd. All rights reserved.
Structural features that predict real-value fluctuations of globular proteins.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-05-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.
Structural features that predict real-value fluctuations of globular proteins
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-01-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193
A Practical Approach to Protein Crystallography.
Ilari, Andrea; Savino, Carmelinda
2017-01-01
Macromolecular crystallography is a powerful tool for structural biology. The resolution of a protein crystal structure is becoming much easier than in the past, thanks to developments in computing, automation of crystallization techniques and high-flux synchrotron sources to collect diffraction datasets. The aim of this chapter is to provide practical procedures to determine a protein crystal structure, illustrating the new techniques, experimental methods, and software that have made protein crystallography a tool accessible to a larger scientific community.It is impossible to give more than a taste of what the X-ray crystallographic technique entails in one brief chapter and there are different ways to solve a protein structure. Since the number of structures available in the Protein Data Bank (PDB) is becoming ever larger (the protein data bank now contains more than 100,000 entries) and therefore the probability to find a good model to solve the structure is ever increasing, we focus our attention on the Molecular Replacement method. Indeed, whenever applicable, this method allows the resolution of macromolecular structures starting from a single data set and a search model downloaded from the PDB, with the aid only of computer work.
An Algorithm for Protein Helix Assignment Using Helix Geometry
Cao, Chen; Xu, Shutan; Wang, Lincong
2015-01-01
Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins. PMID:26132394
NASA Astrophysics Data System (ADS)
Illing, Gerd; Saenger, Wolfram; Heinemann, Udo
2000-06-01
The Protein Structure Factory will be established to characterize proteins encoded by human genes or cDNAs, which will be selected by criteria of potential structural novelty or medical or biotechnological usefulness. It represents an integrative approach to structure analysis combining bioinformatics techniques, automated gene expression and purification of gene products, generation of a biophysical fingerprint of the proteins and the determination of their three-dimensional structures either by NMR spectroscopy or by X-ray diffraction. The use of synchrotron radiation will be crucial to the Protein Structure Factory: high brilliance and tunable wavelengths are prerequisites for fast data collection, the use of small crystals and multiwavelength anomalous diffraction (MAD) phasing. With the opening of BESSY II, direct access to a third-generation XUV storage ring source with excellent conditions is available nearby. An insertion device with two MAD beamlines and one constant energy station will be set up until 2001.
Recognition of coarse-grained protein tertiary structure.
Lezon, Timothy; Banavar, Jayanth R; Maritan, Amos
2004-05-15
A model of the protein backbone is considered in which each residue is characterized by the location of its C(alpha) atom and one of a discrete set of conformal (phi, psi) states. We investigate the key differences between a description that offers a locally precise fit to known backbone structures and one that provides a globally accurate fit to protein structures. Using a statistical scoring scheme and threading, a protein's local best-fit conformation is highly recognizable, but its global structure cannot be directly determined from an amino acid sequence. The incorporation of information about the conformal states of neighboring residues along the chain allows one to accurately translate the local structure into a global structure. We present a two-step algorithm, which recognizes up to 95% of the tested protein native-state structures to within a 2.5 A root mean square deviation. Copyright 2004 Wiley-Liss, Inc.
Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.
2007-01-01
SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171
Thompson, Jared J; Tabatabaei Ghomi, Hamed; Lill, Markus A
2014-12-01
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure. © 2014 Wiley Periodicals, Inc.
Structure prediction of polyglutamine disease proteins: comparison of methods
2014-01-01
Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases. PMID:25080018
Data-assisted protein structure modeling by global optimization in CASP12.
Joo, Keehyoung; Heo, Seungryong; Joung, InSuk; Hong, Seung Hwan; Lee, Sung Jong; Lee, Jooyoung
2018-03-01
In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling. © 2018 Wiley Periodicals, Inc.
Protein structural similarity search by Ramachandran codes
Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang
2007-01-01
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377
Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun
2017-11-27
Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
Deciphering Cryptic Binding Sites on Proteins by Mixed-Solvent Molecular Dynamics.
Kimura, S Roy; Hu, Hai Peng; Ruvinsky, Anatoly M; Sherman, Woody; Favia, Angelo D
2017-06-26
In recent years, molecular dynamics simulations of proteins in explicit mixed solvents have been applied to various problems in protein biophysics and drug discovery, including protein folding, protein surface characterization, fragment screening, allostery, and druggability assessment. In this study, we perform a systematic study on how mixtures of organic solvent probes in water can reveal cryptic ligand binding pockets that are not evident in crystal structures of apo proteins. We examine a diverse set of eight PDB proteins that show pocket opening induced by ligand binding and investigate whether solvent MD simulations on the apo structures can induce the binding site observed in the holo structures. The cosolvent simulations were found to induce conformational changes on the protein surface, which were characterized and compared with the holo structures. Analyses of the biological systems, choice of probes and concentrations, druggability of the resulting induced pockets, and application to drug discovery are discussed here.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Qin, Nan; Zhang, Shaoqing; Jiang, Jianjuan; Corder, Stephanie Gilbert; Qian, Zhigang; Zhou, Zhitao; Lee, Woonsoo; Liu, Keyin; Wang, Xiaohan; Li, Xinxin; Shi, Zhifeng; Mao, Ying; Bechtel, Hans A.; Martin, Michael C.; Xia, Xiaoxia; Marelli, Benedetto; Kaplan, David L.; Omenetto, Fiorenzo G.; Liu, Mengkun; Tao, Tiger H.
2016-01-01
Silk protein fibres produced by silkworms and spiders are renowned for their unparalleled mechanical strength and extensibility arising from their high-β-sheet crystal contents as natural materials. Investigation of β-sheet-oriented conformational transitions in silk proteins at the nanoscale remains a challenge using conventional imaging techniques given their limitations in chemical sensitivity or limited spatial resolution. Here, we report on electron-regulated nanoscale polymorphic transitions in silk proteins revealed by near-field infrared imaging and nano-spectroscopy at resolutions approaching the molecular level. The ability to locally probe nanoscale protein structural transitions combined with nanometre-precision electron-beam lithography offers us the capability to finely control the structure of silk proteins in two and three dimensions. Our work paves the way for unlocking essential nanoscopic protein structures and critical conditions for electron-induced conformational transitions, offering new rules to design protein-based nanoarchitectures. PMID:27713412
Crystal Structure of a Plant Multidrug and Toxic Compound Extrusion Family Protein.
Tanaka, Yoshiki; Iwaki, Shigehiro; Tsukazaki, Tomoya
2017-09-05
The multidrug and toxic compound extrusion (MATE) family of proteins consists of transporters responsible for multidrug resistance in prokaryotes. In plants, a number of MATE proteins were identified by recent genomic and functional studies, which imply that the proteins have substrate-specific transport functions instead of multidrug extrusion. The three-dimensional structure of eukaryotic MATE proteins, including those of plants, has not been reported, preventing a better understanding of the molecular mechanism of these proteins. Here, we describe the crystal structure of a MATE protein from the plant Camelina sativa at 2.9 Å resolution. Two sets of six transmembrane α helices, assembled pseudo-symmetrically, possess a negatively charged internal pocket with an outward-facing shape. The crystal structure provides insight into the diversity of plant MATE proteins and their substrate recognition and transport through the membrane. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cang, Zixuan; Mu, Lin; Wu, Kedi
Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
A web interface for easy flexible protein-protein docking with ATTRACT.
de Vries, Sjoerd J; Schindler, Christina E M; Chauvot de Beauchêne, Isaure; Zacharias, Martin
2015-02-03
Protein-protein docking programs can give valuable insights into the structure of protein complexes in the absence of an experimental complex structure. Web interfaces can facilitate the use of docking programs by structural biologists. Here, we present an easy web interface for protein-protein docking with the ATTRACT program. While aimed at nonexpert users, the web interface still covers a considerable range of docking applications. The web interface supports systematic rigid-body protein docking with the ATTRACT coarse-grained force field, as well as various kinds of protein flexibility. The execution of a docking protocol takes up to a few hours on a standard desktop computer. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Transmembrane proteins in the Protein Data Bank: identification and classification.
Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István
2004-11-22
Integral membrane proteins play important roles in living cells. Although these proteins are estimated to constitute 25% of proteins at a genomic scale, the Protein Data Bank (PDB) contains only a few hundred membrane proteins due to the difficulties with experimental techniques. The presence of transmembrane proteins in the structure data bank, however, is quite invisible, as the annotation of these entries is rather poor. Even if a protein is identified as a transmembrane one, the possible location of the lipid bilayer is not indicated in the PDB because these proteins are crystallized without their natural lipid bilayer, and currently no method is publicly available to detect the possible membrane plane using the atomic coordinates of membrane proteins. Here, we present a new geometrical approach to distinguish between transmembrane and globular proteins using structural information only and to locate the most likely position of the lipid bilayer. An automated algorithm (TMDET) is given to determine the membrane planes relative to the position of atomic coordinates, together with a discrimination function which is able to separate transmembrane and globular proteins even in cases of low resolution or incomplete structures such as fragments or parts of large multi chain complexes. This method can be used for the proper annotation of protein structures containing transmembrane segments and paves the way to an up-to-date database containing the structure of all known transmembrane proteins and fragments (PDB_TM) which can be automatically updated. The algorithm is equally important for the purpose of constructing databases purely of globular proteins.
Miyakawa, Takuya; Sawano, Yoriko; Miyazono, Ken-ichi; Miyauchi, Yumiko; Hatano, Ken-ichi
2013-01-01
STK_08120 is a member of the thermoacidophile-specific DUF3211 protein family from Sulfolobus tokodaii strain 7. Its molecular function remains obscure, and sequence similarities for obtaining functional remarks are not available. In this study, the crystal structure of STK_08120 was determined at 1.79-Å resolution to predict its probable function using structure similarity searches. The structure adopts an α/β structure of a helix-grip fold, which is found in the START domain proteins with cavities for hydrophobic substrates or ligands. The detailed structural features implied that fatty acids are the primary ligand candidates for STK_08120, and binding assays revealed that the protein bound long-chain saturated fatty acids (>C14) and their trans-unsaturated types with an affinity equal to that for major fatty acid binding proteins in mammals and plants. Moreover, the structure of an STK_08120-myristic acid complex revealed a unique binding mode among fatty acid binding proteins. These results suggest that the thermoacidophile-specific protein family DUF3211 functions as a fatty acid carrier with a novel binding mode. PMID:23836863
Blind test of physics-based prediction of protein structures.
Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A
2009-02-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.
Blind Test of Physics-Based Prediction of Protein Structures
Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.
2009-01-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130
Karkute, Suhas G; Easwaran, Murugesh; Gujjar, Ranjit Singh; Piramanayagam, Shanmughavel; Singh, Major
2015-10-01
WRKY genes are members of one of the largest families of plant transcription factors and play an important role in response to biotic and abiotic stresses, and overall growth and development. Understanding the interaction of WRKY proteins with other proteins/ligands in plant cells is of utmost importance to develop plants having tolerance to biotic and abiotic stresses. The SlWRKY4 gene was cloned from a drought tolerant wild species of tomato (Solanum habrochaites) and the secondary structure and 3D modeling of this protein were predicted using Schrödinger Suite-Prime. Predicted structures were also subjected to plot against Ramachandran's conformation, and the modeled structure was minimized using Macromodel. Finally, the minimized structure was simulated in the water environment to check the protein stability. The behavior of the modeled structure was well-simulated and analyzed through RMSD and RMSF of the protein. The present work provides the modeled 3D structure of SlWRKY4 that will help in understanding the mechanism of gene regulation by further in silico interaction studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leite, Wellington C.; Galvão, Carolina W.; Saab, Sérgio C.
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminalmore » polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. In conclusion, our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament.« less
Galvão, Carolina W.; Saab, Sérgio C.; Iulek, Jorge; Etto, Rafael M.; Steffens, Maria B. R.; Chitteni-Pattu, Sindhu; Stanage, Tyler; Keck, James L.; Cox, Michael M.
2016-01-01
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminal polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. Our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament. PMID:27447485
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raymond, Amy; Lovell, Scott; Lorimer, Don
2009-12-01
With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38{alpha}), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. colimore » and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.« less
NASA Technical Reports Server (NTRS)
Weaver, D. L.
1982-01-01
Theoretical methods and solutions of the dynamics of protein folding, protein aggregation, protein structure, and the origin of life are discussed. The elements of a dynamic model representing the initial stages of protein folding are presented. The calculation and experimental determination of the model parameters are discussed. The use of computer simulation for modeling protein folding is considered.
2015-06-03
demonstrating its immunogenicity in humans. PdSP15 sequence and structure show no homol- ogy to mammalian proteins, further demonstrating its potential...sequence or structure homology to known human proteins The protective salivary antigen PdSP15 shares sequence homology only to the small odorant binding...salivary proteins PpSP15 and PsSP15, respectively (Fig. 4B). To exclude any structural similarities to human pro teins, the crystal structure of PdPS15
Structural basis for host membrane remodeling induced by protein 2B of hepatitis A virus.
Vives-Adrián, Laia; Garriga, Damià; Buxaderas, Mònica; Fraga, Joana; Pereira, Pedro José Barbosa; Macedo-Ribeiro, Sandra; Verdaguer, Núria
2015-04-01
The complexity of viral RNA synthesis and the numerous participating factors require a mechanism to topologically coordinate and concentrate these multiple viral and cellular components, ensuring a concerted function. Similarly to all other positive-strand RNA viruses, picornaviruses induce rearrangements of host intracellular membranes to create structures that act as functional scaffolds for genome replication. The membrane-targeting proteins 2B and 2C, their precursor 2BC, and protein 3A appear to be primarily involved in membrane remodeling. Little is known about the structure of these proteins and the mechanisms by which they induce massive membrane remodeling. Here we report the crystal structure of the soluble region of hepatitis A virus (HAV) protein 2B, consisting of two domains: a C-terminal helical bundle preceded by an N-terminally curved five-stranded antiparallel β-sheet that displays striking structural similarity to the β-barrel domain of enteroviral 2A proteins. Moreover, the helicoidal arrangement of the protein molecules in the crystal provides a model for 2B-induced host membrane remodeling during HAV infection. No structural information is currently available for the 2B protein of any picornavirus despite it being involved in a critical process in viral factory formation: the rearrangement of host intracellular membranes. Here we present the structure of the soluble domain of the 2B protein of hepatitis A virus (HAV). Its arrangement, both in crystals and in solution under physiological conditions, can help to understand its function and sheds some light on the membrane rearrangement process, a putative target of future antiviral drugs. Moreover, this first structure of a picornaviral 2B protein also unveils a closer evolutionary relationship between the hepatovirus and enterovirus genera within the Picornaviridae family. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Rational design of alpha-helical tandem repeat proteins with closed architectures
Doyle, Lindsey; Hallinan, Jazmine; Bolduc, Jill; Parmeggiani, Fabio; Baker, David; Stoddard, Barry L.; Bradley, Philip
2015-01-01
Tandem repeat proteins, which are formed by repetition of modular units of protein sequence and structure, play important biological roles as macromolecular binding and scaffolding domains, enzymes, and building blocks for the assembly of fibrous materials1,2. The modular nature of repeat proteins enables the rapid construction and diversification of extended binding surfaces by duplication and recombination of simple building blocks3,4. The overall architecture of tandem repeat protein structures – which is dictated by the internal geometry and local packing of the repeat building blocks – is highly diverse, ranging from extended, super-helical folds that bind peptide, DNA, and RNA partners5–9, to closed and compact conformations with internal cavities suitable for small molecule binding and catalysis10. Here we report the development and validation of computational methods for de novo design of tandem repeat protein architectures driven purely by geometric criteria defining the inter-repeat geometry, without reference to the sequences and structures of existing repeat protein families. We have applied these methods to design a series of closed alpha-solenoid11 repeat structures (alpha-toroids) in which the inter-repeat packing geometry is constrained so as to juxtapose the N- and C-termini; several of these designed structures have been validated by X-ray crystallography. Unlike previous approaches to tandem repeat protein engineering12–20, our design procedure does not rely on template sequence or structural information taken from natural repeat proteins and hence can produce structures unlike those seen in nature. As an example, we have successfully designed and validated closed alpha-solenoid repeats with a left-handed helical architecture that – to our knowledge – is not yet present in the protein structure database21. PMID:26675735
Structural Basis for Host Membrane Remodeling Induced by Protein 2B of Hepatitis A Virus
Vives-Adrián, Laia; Garriga, Damià; Buxaderas, Mònica; Fraga, Joana; Pereira, Pedro José Barbosa
2015-01-01
ABSTRACT The complexity of viral RNA synthesis and the numerous participating factors require a mechanism to topologically coordinate and concentrate these multiple viral and cellular components, ensuring a concerted function. Similarly to all other positive-strand RNA viruses, picornaviruses induce rearrangements of host intracellular membranes to create structures that act as functional scaffolds for genome replication. The membrane-targeting proteins 2B and 2C, their precursor 2BC, and protein 3A appear to be primarily involved in membrane remodeling. Little is known about the structure of these proteins and the mechanisms by which they induce massive membrane remodeling. Here we report the crystal structure of the soluble region of hepatitis A virus (HAV) protein 2B, consisting of two domains: a C-terminal helical bundle preceded by an N-terminally curved five-stranded antiparallel β-sheet that displays striking structural similarity to the β-barrel domain of enteroviral 2A proteins. Moreover, the helicoidal arrangement of the protein molecules in the crystal provides a model for 2B-induced host membrane remodeling during HAV infection. IMPORTANCE No structural information is currently available for the 2B protein of any picornavirus despite it being involved in a critical process in viral factory formation: the rearrangement of host intracellular membranes. Here we present the structure of the soluble domain of the 2B protein of hepatitis A virus (HAV). Its arrangement, both in crystals and in solution under physiological conditions, can help to understand its function and sheds some light on the membrane rearrangement process, a putative target of future antiviral drugs. Moreover, this first structure of a picornaviral 2B protein also unveils a closer evolutionary relationship between the hepatovirus and enterovirus genera within the Picornaviridae family. PMID:25589659
Proteins with Novel Structure, Function and Dynamics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2014-01-01
Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.
High-throughput crystallization screening.
Skarina, Tatiana; Xu, Xiaohui; Evdokimova, Elena; Savchenko, Alexei
2014-01-01
Protein structure determination by X-ray crystallography is dependent on obtaining a single protein crystal suitable for diffraction data collection. Due to this requirement, protein crystallization represents a key step in protein structure determination. The conditions for protein crystallization have to be determined empirically for each protein, making this step also a bottleneck in the structure determination process. Typical protein crystallization practice involves parallel setup and monitoring of a considerable number of individual protein crystallization experiments (also called crystallization trials). In these trials the aliquots of purified protein are mixed with a range of solutions composed of a precipitating agent, buffer, and sometimes an additive that have been previously successful in prompting protein crystallization. The individual chemical conditions in which a particular protein shows signs of crystallization are used as a starting point for further crystallization experiments. The goal is optimizing the formation of individual protein crystals of sufficient size and quality to make them suitable for diffraction data collection. Thus the composition of the primary crystallization screen is critical for successful crystallization.Systematic analysis of crystallization experiments carried out on several hundred proteins as part of large-scale structural genomics efforts allowed the optimization of the protein crystallization protocol and identification of a minimal set of 96 crystallization solutions (the "TRAP" screen) that, in our experience, led to crystallization of the maximum number of proteins.
Esque, Jérémy; Urbain, Aurélie; Etchebest, Catherine; de Brevern, Alexandre G
2015-11-01
Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Lapkouski, Mikalai; Hofbauerova, Katerina; Sovova, Zofie; Ettrichova, Olga; González-Pérez, Sergio; Dulebo, Alexander; Kaftan, David; Kuta Smatanova, Ivana; Revuelta, Jose L.; Arellano, Juan B.; Carey, Jannette; Ettrich, Rüdiger
2012-01-01
Raman microscopy permits structural analysis of protein crystals in situ in hanging drops, allowing for comparison with Raman measurements in solution. Nevertheless, the two methods sometimes reveal subtle differences in structure that are often ascribed to the water layer surrounding the protein. The novel method of drop-coating deposition Raman spectropscopy (DCDR) exploits an intermediate phase that, although nominally “dry,” has been shown to preserve protein structural features present in solution. The potential of this new approach to bridge the structural gap between proteins in solution and in crystals is explored here with extrinsic protein PsbP of photosystem II from Spinacia oleracea. In the high-resolution (1.98 Å) x-ray crystal structure of PsbP reported here, several segments of the protein chain are present but unresolved. Analysis of the three kinds of Raman spectra of PsbP suggests that most of the subtle differences can indeed be attributed to the water envelope, which is shown here to have a similar Raman intensity in glassy and crystal states. Using molecular dynamics simulations cross-validated by Raman solution data, two unresolved segments of the PsbP crystal structure were modeled as loops, and the amino terminus was inferred to contain an additional beta segment. The complete PsbP structure was compared with that of the PsbP-like protein CyanoP, which plays a more peripheral role in photosystem II function. The comparison suggests possible interaction surfaces of PsbP with higher-plant photosystem II. This work provides the first complete structural picture of this key protein, and it represents the first systematic comparison of Raman data from solution, glassy, and crystalline states of a protein. PMID:23071614
Yamamoto, Norifumi
2014-08-21
The conformational conversion of proteins into an aggregation-prone form is a common feature of various neurodegenerative disorders including Alzheimer's, Huntington's, Parkinson's, and prion diseases. In the early stage of prion diseases, secondary structure conversion in prion protein (PrP) causing β-sheet expansion facilitates the formation of a pathogenic isoform with a high content of β-sheets and strong aggregation tendency to form amyloid fibrils. Herein, we propose a straightforward method to extract essential information regarding the secondary structure conversion of proteins from molecular simulations, named secondary structure principal component analysis (SSPCA). The definite existence of a PrP isoform with an increased β-sheet structure was confirmed in a free-energy landscape constructed by mapping protein structural data into a reduced space according to the principal components determined by the SSPCA. We suggest a "spot" of structural ambivalence in PrP-the C-terminal part of helix 2-that lacks a strong intrinsic secondary structure, thus promoting a partial α-helix-to-β-sheet conversion. This result is important to understand how the pathogenic conformational conversion of PrP is initiated in prion diseases. The SSPCA has great potential to solve various challenges in studying highly flexible molecular systems, such as intrinsically disordered proteins, structurally ambivalent peptides, and chameleon sequences.
Caetano-Anollés, Gustavo; Kim, Kyung Mo; Caetano-Anollés, Derek
2012-02-01
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Predicting protein interactions by Brownian dynamics simulations.
Meng, Xuan-Yu; Xu, Yu; Zhang, Hong-Xing; Mezei, Mihaly; Cui, Meng
2012-01-01
We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.
G Protein-Coupled Receptor Rhodopsin: A Prospectus
Filipek, Sławomir; Stenkamp, Ronald E.; Teller, David C.; Palczewski, Krzysztof
2006-01-01
Rhodopsin is a retinal photoreceptor protein of bipartite structure consisting of the transmembrane protein opsin and a light-sensitive chromophore 11-cis-retinal, linked to opsin via a protonated Schiff base. Studies on rhodopsin have unveiled many structural and functional features that are common to a large and pharmacologically important group of proteins from the G protein-coupled receptor (GPCR) superfamily, of which rhodopsin is the best-studied member. In this work, we focus on structural features of rhodopsin as revealed by many biochemical and structural investigations. In particular, the high-resolution structure of bovine rhodopsin provides a template for understanding how GPCRs work. We describe the sensitivity and complexity of rhodopsin that lead to its important role in vision. PMID:12471166
Crystal structure of AFV1-102, a protein from the acidianus filamentous virus 1
Keller, Jenny; Leulliot, Nicolas; Collinet, Bruno; Campanacci, Valerie; Cambillau, Christian; Pranghisvilli, David; van Tilbeurgh, Herman
2009-01-01
Viruses infecting hyperthermophilic archaea have intriguing morphologies and genomic properties. The vast majority of their genes do not have homologs other than in other hyperthermophilic viruses, and the biology of these viruses is poorly understood. As part of a structural genomics project on the proteins of these viruses, we present here the structure of a 102 amino acid protein from acidianus filamentous virus 1 (AFV1-102). The structure shows that it is made of two identical motifs that have poor sequence similarity. Although no function can be proposed from structural analysis, tight binding of the gateway tag peptide in a groove between the two motifs suggests AFV1-102 is involved in protein protein interactions. PMID:19319936
Accurate protein structure modeling using sparse NMR data and homologous structure information.
Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David
2012-06-19
While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.
Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations
Seeliger, Daniel; de Groot, Bert L.
2010-01-01
Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034
Nealon, John Oliver; Philomina, Limcy Seby
2017-01-01
The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development. PMID:29206185
A series of PDB related databases for everyday needs.
Joosten, Robbie P; te Beek, Tim A H; Krieger, Elmar; Hekkelman, Maarten L; Hooft, Rob W W; Schneider, Reinhard; Sander, Chris; Vriend, Gert
2011-01-01
The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.
Non-Uniform Sampling and J-UNIO Automation for Efficient Protein NMR Structure Determination.
Didenko, Tatiana; Proudfoot, Andrew; Dutta, Samit Kumar; Serrano, Pedro; Wüthrich, Kurt
2015-08-24
High-resolution structure determination of small proteins in solution is one of the big assets of NMR spectroscopy in structural biology. Improvements in the efficiency of NMR structure determination by advances in NMR experiments and automation of data handling therefore attracts continued interest. Here, non-uniform sampling (NUS) of 3D heteronuclear-resolved [(1)H,(1)H]-NOESY data yielded two- to three-fold savings of instrument time for structure determinations of soluble proteins. With the 152-residue protein NP_372339.1 from Staphylococcus aureus and the 71-residue protein NP_346341.1 from Streptococcus pneumonia we show that high-quality structures can be obtained with NUS NMR data, which are equally well amenable to robust automated analysis as the corresponding uniformly sampled data. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.