Sample records for protein structural features

  1. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  2. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  3. What are the structural features that drive partitioning of proteins in aqueous two-phase systems?

    PubMed

    Wu, Zhonghua; Hu, Gang; Wang, Kui; Zaslavsky, Boris Yu; Kurgan, Lukasz; Uversky, Vladimir N

    2017-01-01

    Protein partitioning in aqueous two-phase systems (ATPSs) represents a convenient, inexpensive, and easy to scale-up protein separation technique. Since partition behavior of a protein dramatically depends on an ATPS composition, it would be highly beneficial to have reliable means for (even qualitative) prediction of partitioning of a target protein under different conditions. Our aim was to understand which structural features of proteins contribute to partitioning of a query protein in a given ATPS. We undertook a systematic empirical analysis of relations between 57 numerical structural descriptors derived from the corresponding amino acid sequences and crystal structures of 10 well-characterized proteins and the partition behavior of these proteins in 29 different ATPSs. This analysis revealed that just a few structural characteristics of proteins can accurately determine behavior of these proteins in a given ATPS. However, partition behavior of proteins in different ATPSs relies on different structural features. In other words, we could not find a unique set of protein structural features derived from their crystal structures that could be used for the description of the protein partition behavior of all proteins in all ATPSs analyzed in this study. We likely need to gain better insight into relationships between protein-solvent interactions and protein structure peculiarities, in particular given limitations of the used here crystal structures, to be able to construct a model that accurately predicts protein partition behavior across all ATPSs. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  5. Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

    PubMed Central

    Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong

    2012-01-01

    Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126

  6. Proteins without unique 3D structures: biotechnological applications of intrinsically unstable/disordered proteins.

    PubMed

    Uversky, Vladimir N

    2015-03-01

    Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. UbSRD: The Ubiquitin Structural Relational Database.

    PubMed

    Harrison, Joseph S; Jacobs, Tim M; Houlihan, Kevin; Van Doorslaer, Koenraad; Kuhlman, Brian

    2016-02-22

    The structurally defined ubiquitin-like homology fold (UBL) can engage in several unique protein-protein interactions and many of these complexes have been characterized with high-resolution techniques. Using Rosetta's structural classification tools, we have created the Ubiquitin Structural Relational Database (UbSRD), an SQL database of features for all 509 UBL-containing structures in the PDB, allowing users to browse these structures by protein-protein interaction and providing a platform for quantitative analysis of structural features. We used UbSRD to define the recognition features of ubiquitin (UBQ) and SUMO observed in the PDB and the orientation of the UBQ tail while interacting with certain types of proteins. While some of the interaction surfaces on UBQ and SUMO overlap, each molecule has distinct features that aid in molecular discrimination. Additionally, we find that the UBQ tail is malleable and can adopt a variety of conformations upon binding. UbSRD is accessible as an online resource at rosettadesign.med.unc.edu/ubsrd. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

    PubMed

    Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong

    2009-07-01

    A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.

  9. regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution.

    PubMed

    Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong

    2017-09-01

    While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.

  10. Therapeutic approaches against common structural features of toxic oligomers shared by multiple amyloidogenic proteins.

    PubMed

    Guerrero-Muñoz, Marcos J; Castillo-Carranza, Diana L; Kayed, Rakez

    2014-04-15

    Impaired proteostasis is one of the main features of all amyloid diseases, which are associated with the formation of insoluble aggregates from amyloidogenic proteins. The aggregation process can be caused by overproduction or poor clearance of these proteins. However, numerous reports suggest that amyloid oligomers are the most toxic species, rather than insoluble fibrillar material, in Alzheimer's, Parkinson's, and Prion diseases, among others. Although the exact protein that aggregates varies between amyloid disorders, they all share common structural features that can be used as therapeutic targets. In this review, we focus on therapeutic approaches against shared features of toxic oligomeric structures and future directions. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Protein functional features are reflected in the patterns of mRNA translation speed.

    PubMed

    López, Daniel; Pazos, Florencio

    2015-07-09

    The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.

  12. How Structure Defines Affinity in Protein-Protein Interactions

    PubMed Central

    Erijman, Ariel; Rosenthal, Eran; Shifman, Julia M.

    2014-01-01

    Protein-protein interactions (PPI) in nature are conveyed by a multitude of binding modes involving various surfaces, secondary structure elements and intermolecular interactions. This diversity results in PPI binding affinities that span more than nine orders of magnitude. Several early studies attempted to correlate PPI binding affinities to various structure-derived features with limited success. The growing number of high-resolution structures, the appearance of more precise methods for measuring binding affinities and the development of new computational algorithms enable more thorough investigations in this direction. Here, we use a large dataset of PPI structures with the documented binding affinities to calculate a number of structure-based features that could potentially define binding energetics. We explore how well each calculated biophysical feature alone correlates with binding affinity and determine the features that could be used to distinguish between high-, medium- and low- affinity PPIs. Furthermore, we test how various combinations of features could be applied to predict binding affinity and observe a slow improvement in correlation as more features are incorporated into the equation. In addition, we observe a considerable improvement in predictions if we exclude from our analysis low-resolution and NMR structures, revealing the importance of capturing exact intermolecular interactions in our calculations. Our analysis should facilitate prediction of new interactions on the genome scale, better characterization of signaling networks and design of novel binding partners for various target proteins. PMID:25329579

  13. Investigating Molecular Structures of Bio-Fuel and Bio-Oil Seeds as Predictors To Estimate Protein Bioavailability for Ruminants by Advanced Nondestructive Vibrational Molecular Spectroscopy.

    PubMed

    Ban, Yajing; L Prates, Luciana; Yu, Peiqiang

    2017-10-18

    This study was conducted to (1) determine protein and carbohydrate molecular structure profiles and (2) quantify the relationship between structural features and protein bioavailability of newly developed carinata and canola seeds for dairy cows by using Fourier transform infrared molecular spectroscopy. Results showed similarity in protein structural makeup within the entire protein structural region between carinata and canola seeds. The highest area ratios related to structural CHO, total CHO, and cellulosic compounds were obtained for carinata seeds. Carinata and canola seeds showed similar carbohydrate and protein molecular structures by multivariate analyses. Carbohydrate molecular structure profiles were highly correlated to protein rumen degradation and intestinal digestion characteristics. In conclusion, the molecular spectroscopy can detect inherent structural characteristics in carinata and canola seeds in which carbohydrate-relative structural features are related to protein metabolism and utilization. Protein and carbohydrate spectral profiles could be used as predictors of rumen protein bioavailability in cows.

  14. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    PubMed

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  15. Some of the most interesting CASP11 targets through the eyes of their authors.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten

    2016-09-01

    The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  16. The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource.

    PubMed

    Gifford, Lida K; Carter, Lester G; Gabanyi, Margaret J; Berman, Helen M; Adams, Paul D

    2012-06-01

    The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.

  17. Structural classification of proteins using texture descriptors extracted from the cellular automata image.

    PubMed

    Kavianpour, Hamidreza; Vasighi, Mahdi

    2017-02-01

    Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.

  18. DNAproDB: an interactive tool for structural analysis of DNA–protein complexes

    PubMed Central

    Sagendorf, Jared M.

    2017-01-01

    Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131

  19. A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

    PubMed

    Ni, Qianwu; Chen, Lei

    2017-01-01

    Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. Protein sectors: evolutionary units of three-dimensional structure

    PubMed Central

    Halabi, Najeeb; Rivoire, Olivier; Leibler, Stanislas; Ranganathan, Rama

    2011-01-01

    Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term “protein sectors”. Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories. PMID:19703402

  1. Some of the most interesting CASP11 targets through the eyes of their authors

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K.; Edwards, Robert A.; Fass, Deborah; Hartmann, Marcus D.; Korycinski, Mateusz; Lewis, Richard J.; Lorimer, Donald; Lupas, Andrei N.; Newman, Janet; Peat, Thomas S.; Piepenbrink, Kurt H.; Prahlad, Janani; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca M.; Seguritan, Victor; Sundberg, Eric J.; Singh, Abhimanyu K.; Wilson, Mark A.

    2015-01-01

    ABSTRACT The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34–50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26473983

  2. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  3. Common structural features of cholesterol binding sites in crystallized soluble proteins

    PubMed Central

    Bukiya, Anna N.; Dopico, Alejandro M.

    2017-01-01

    Cholesterol-protein interactions are essential for the architectural organization of cell membranes and for lipid metabolism. While cholesterol-sensing motifs in transmembrane proteins have been identified, little is known about cholesterol recognition by soluble proteins. We reviewed the structural characteristics of binding sites for cholesterol and cholesterol sulfate from crystallographic structures available in the Protein Data Bank. This analysis unveiled key features of cholesterol-binding sites that are present in either all or the majority of sites: i) the cholesterol molecule is generally positioned between protein domains that have an organized secondary structure; ii) the cholesterol hydroxyl/sulfo group is often partnered by Asn, Gln, and/or Tyr, while the hydrophobic part of cholesterol interacts with Leu, Ile, Val, and/or Phe; iii) cholesterol hydrogen-bonding partners are often found on α-helices, while amino acids that interact with cholesterol’s hydrophobic core have a slight preference for β-strands and secondary structure-lacking protein areas; iv) the steroid’s C21 and C26 constitute the “hot spots” most often seen for steroid-protein hydrophobic interactions; v) common “cold spots” are C8–C10, C13, and C17, at which contacts with the proteins were not detected. Several common features we identified for soluble protein-steroid interaction appear evolutionarily conserved. PMID:28420706

  4. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  5. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  6. G Protein-Coupled Receptor Rhodopsin: A Prospectus

    PubMed Central

    Filipek, Sławomir; Stenkamp, Ronald E.; Teller, David C.; Palczewski, Krzysztof

    2006-01-01

    Rhodopsin is a retinal photoreceptor protein of bipartite structure consisting of the transmembrane protein opsin and a light-sensitive chromophore 11-cis-retinal, linked to opsin via a protonated Schiff base. Studies on rhodopsin have unveiled many structural and functional features that are common to a large and pharmacologically important group of proteins from the G protein-coupled receptor (GPCR) superfamily, of which rhodopsin is the best-studied member. In this work, we focus on structural features of rhodopsin as revealed by many biochemical and structural investigations. In particular, the high-resolution structure of bovine rhodopsin provides a template for understanding how GPCRs work. We describe the sensitivity and complexity of rhodopsin that lead to its important role in vision. PMID:12471166

  7. A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

    PubMed

    Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul

    2013-01-01

    Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

  8. Voroprot: an interactive tool for the analysis and visualization of complex geometric features of protein structure.

    PubMed

    Olechnovic, Kliment; Margelevicius, Mindaugas; Venclovas, Ceslovas

    2011-03-01

    We present Voroprot, an interactive cross-platform software tool that provides a unique set of capabilities for exploring geometric features of protein structure. Voroprot allows the construction and visualization of the Apollonius diagram (also known as the additively weighted Voronoi diagram), the Apollonius graph, protein alpha shapes, interatomic contact surfaces, solvent accessible surfaces, pockets and cavities inside protein structure. Voroprot is available for Windows, Linux and Mac OS X operating systems and can be downloaded from http://www.ibt.lt/bioinformatics/voroprot/.

  9. A sampling-based method for ranking protein structural models by integrating multiple scores and features.

    PubMed

    Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong

    2011-09-01

    One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

  10. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.

    PubMed

    Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin

    2016-06-07

    RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .

  11. Analysis of Structural Features Contributing to Weak Affinities of Ubiquitin/Protein Interactions.

    PubMed

    Cohen, Ariel; Rosenthal, Eran; Shifman, Julia M

    2017-11-10

    Ubiquitin is a small protein that enables one of the most common post-translational modifications, where the whole ubiquitin molecule is attached to various target proteins, forming mono- or polyubiquitin conjugations. As a prototypical multispecific protein, ubiquitin interacts non-covalently with a variety of proteins in the cell, including ubiquitin-modifying enzymes and ubiquitin receptors that recognize signals from ubiquitin-conjugated substrates. To enable recognition of multiple targets and to support fast dissociation from the ubiquitin modifying enzymes, ubiquitin/protein interactions are characterized with low affinities, frequently in the higher μM and lower mM range. To determine how structure encodes low binding affinity of ubiquitin/protein complexes, we analyzed structures of more than a hundred such complexes compiled in the Ubiquitin Structural Relational Database. We calculated various structure-based features of ubiquitin/protein binding interfaces and compared them to the same features of general protein-protein interactions (PPIs) with various functions and generally higher affinities. Our analysis shows that ubiquitin/protein binding interfaces on average do not differ in size and shape complementarity from interfaces of higher-affinity PPIs. However, they contain fewer favorable hydrogen bonds and more unfavorable hydrophobic/charge interactions. We further analyzed how binding interfaces change upon affinity maturation of ubiquitin toward its target proteins. We demonstrate that while different features are improved in different experiments, the majority of the evolved complexes exhibit better shape complementarity and hydrogen bond pattern compared to wild-type complexes. Our analysis helps to understand how low-affinity PPIs have evolved and how they could be converted into high-affinity PPIs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

    PubMed

    Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui

    2012-11-07

    RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.

  13. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    PubMed

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures. Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr. zheng@itp.ac.cn or dbu@ict.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. Effective Moment Feature Vectors for Protein Domain Structures

    PubMed Central

    Shi, Jian-Yu; Yiu, Siu-Ming; Zhang, Yan-Ning; Chin, Francis Yuk-Lun

    2013-01-01

    Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100–400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity. PMID:24391828

  15. Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features

    PubMed Central

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-01-01

    Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205

  16. Extracting physicochemical features to predict protein secondary structure.

    PubMed

    Huang, Yin-Fu; Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.

  17. Extracting Physicochemical Features to Predict Protein Secondary Structure

    PubMed Central

    Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688

  18. Structural Determination of Functional Domains in Early B-cell Factor (EBF) Family of Transcription Factors Reveals Similarities to Rel DNA-binding Proteins and a Novel Dimerization Motif*

    PubMed Central

    Siponen, Marina I.; Wisniewska, Magdalena; Lehtiö, Lari; Johansson, Ida; Svensson, Linda; Raszewski, Grzegorz; Nilsson, Lennart; Sigvardsson, Mikael; Berglund, Helena

    2010-01-01

    The early B-cell factor (EBF) transcription factors are central regulators of development in several organs and tissues. This protein family shows low sequence similarity to other protein families, which is why structural information for the functional domains of these proteins is crucial to understand their biochemical features. We have used a modular approach to determine the crystal structures of the structured domains in the EBF family. The DNA binding domain reveals a striking resemblance to the DNA binding domains of the Rel homology superfamily of transcription factors but contains a unique zinc binding structure, termed zinc knuckle. Further the EBF proteins contain an IPT/TIG domain and an atypical helix-loop-helix domain with a novel type of dimerization motif. The data presented here provide insights into unique structural features of the EBF proteins and open possibilities for detailed molecular investigations of this important transcription factor family. PMID:20592035

  19. Protein structure based prediction of catalytic residues.

    PubMed

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  20. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.

    PubMed

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-04-10

    DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.

  1. Predicting Cell Association of Surface-Modified Nanoparticles Using Protein Corona Structure - Activity Relationships (PCSAR).

    PubMed

    Kamath, Padmaja; Fernandez, Alberto; Giralt, Francesc; Rallo, Robert

    2015-01-01

    Nanoparticles are likely to interact in real-case application scenarios with mixtures of proteins and biomolecules that will absorb onto their surface forming the so-called protein corona. Information related to the composition of the protein corona and net cell association was collected from literature for a library of surface-modified gold and silver nanoparticles. For each protein in the corona, sequence information was extracted and used to calculate physicochemical properties and statistical descriptors. Data cleaning and preprocessing techniques including statistical analysis and feature selection methods were applied to remove highly correlated, redundant and non-significant features. A weighting technique was applied to construct specific signatures that represent the corona composition for each nanoparticle. Using this basic set of protein descriptors, a new Protein Corona Structure-Activity Relationship (PCSAR) that relates net cell association with the physicochemical descriptors of the proteins that form the corona was developed and validated. The features that resulted from the feature selection were in line with already published literature, and the computational model constructed on these features had a good accuracy (R(2)LOO=0.76 and R(2)LMO(25%)=0.72) and stability, with the advantage that the fingerprints based on physicochemical descriptors were independent of the specific proteins that form the corona.

  2. Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.

    PubMed

    Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia

    2016-08-24

    The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.

  3. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

    PubMed

    Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

    2003-12-01

    The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.

  4. Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

    PubMed

    Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

    2018-04-01

    Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Knowledge-based fragment binding prediction.

    PubMed

    Tang, Grace W; Altman, Russ B

    2014-04-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.

  6. Knowledge-based Fragment Binding Prediction

    PubMed Central

    Tang, Grace W.; Altman, Russ B.

    2014-01-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971

  7. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223

  8. Johann Deisenhofer, Crystallography, and Proteins

    Science.gov Websites

    research using X-ray crystallography to elucidate for the first time the three-dimensional structure of a large membrane-bound protein molecule. This structure helped explain the process of photosynthesis, by a protein structure determination that relied on complementary features of two different beam lines

  9. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences.

    PubMed

    Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning

    2018-03-08

    Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.

  10. Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

    PubMed

    Wei, Qiong; Xu, Qifang; Dunbrack, Roland L

    2013-02-01

    Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.

  11. Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting

    PubMed Central

    Wang, Liwen; Chance, Mark R.

    2011-01-01

    Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described. PMID:21770468

  12. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  13. Challenging the state-of-the-art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-01-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984

  14. CCProf: exploring conformational change profile of proteins

    PubMed Central

    Chang, Che-Wei; Chou, Chai-Wei; Chang, Darby Tien-Hao

    2016-01-01

    In many biological processes, proteins have important interactions with various molecules such as proteins, ions or ligands. Many proteins undergo conformational changes upon these interactions, where regions with large conformational changes are critical to the interactions. This work presents the CCProf platform, which provides conformational changes of entire proteins, named conformational change profile (CCP) in the context. CCProf aims to be a platform where users can study potential causes of novel conformational changes. It provides 10 biological features, including conformational change, potential binding target site, secondary structure, conservation, disorder propensity, hydropathy propensity, sequence domain, structural domain, phosphorylation site and catalytic site. All these information are integrated into a well-aligned view, so that researchers can capture important relevance between different biological features visually. The CCProf contains 986 187 protein structure pairs for 3123 proteins. In addition, CCProf provides a 3D view in which users can see the protein structures before and after conformational changes as well as binding targets that induce conformational changes. All information (e.g. CCP, binding targets and protein structures) shown in CCProf, including intermediate data are available for download to expedite further analyses. Database URL: http://zoro.ee.ncku.edu.tw/ccprof/ PMID:27016699

  15. Protein single-model quality assessment by feature-based probability density functions.

    PubMed

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  16. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Antibody-protein interactions: benchmark datasets and prediction tools evaluation

    PubMed Central

    Ponomarenko, Julia V; Bourne, Philip E

    2007-01-01

    Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770

  18. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    PubMed

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Characterization of protein and carbohydrate mid-IR spectral features in crop residues

    NASA Astrophysics Data System (ADS)

    Xin, Hangshu; Zhang, Yonggen; Wang, Mingjun; Li, Zhongyu; Wang, Zhibo; Yu, Peiqiang

    2014-08-01

    To the best of our knowledge, a few studies have been conducted on inherent structure spectral traits related to biopolymers of crop residues. The objective of this study was to characterize protein and carbohydrate structure spectral features of three field crop residues (rice straw, wheat straw and millet straw) in comparison with two crop vines (peanut vine and pea vine) by using Fourier transform infrared spectroscopy (FTIR) technique with attenuated total reflectance (ATR). Also, multivariate analyses were performed on spectral data sets within the regions mainly related to protein and carbohydrate in this study. The results showed that spectral differences existed in mid-IR peak intensities that are mainly related to protein and carbohydrate among these crop residue samples. With regard to protein spectral profile, peanut vine showed the greatest mid-IR band intensities that are related to protein amide and protein secondary structures, followed by pea vine and the rest three field crop straws. The crop vines had 48-134% higher spectral band intensity than the grain straws in spectral features associated with protein. Similar trends were also found in the bands that are mainly related to structural carbohydrates (such as cellulosic compounds). However, the field crop residues had higher peak intensity in total carbohydrates region than the crop vines. Furthermore, spectral ratios varied among the residue samples, indicating that these five crop residues had different internal structural conformation. However, multivariate spectral analyses showed that structural similarities still exhibited among crop residues in the regions associated with protein biopolymers and carbohydrate. Further study is needed to find out whether there is any relationship between spectroscopic information and nutrition supply in various kinds of crop residue when fed to animals.

  20. Characterization of protein and carbohydrate mid-IR spectral features in crop residues.

    PubMed

    Xin, Hangshu; Zhang, Yonggen; Wang, Mingjun; Li, Zhongyu; Wang, Zhibo; Yu, Peiqiang

    2014-08-14

    To the best of our knowledge, a few studies have been conducted on inherent structure spectral traits related to biopolymers of crop residues. The objective of this study was to characterize protein and carbohydrate structure spectral features of three field crop residues (rice straw, wheat straw and millet straw) in comparison with two crop vines (peanut vine and pea vine) by using Fourier transform infrared spectroscopy (FTIR) technique with attenuated total reflectance (ATR). Also, multivariate analyses were performed on spectral data sets within the regions mainly related to protein and carbohydrate in this study. The results showed that spectral differences existed in mid-IR peak intensities that are mainly related to protein and carbohydrate among these crop residue samples. With regard to protein spectral profile, peanut vine showed the greatest mid-IR band intensities that are related to protein amide and protein secondary structures, followed by pea vine and the rest three field crop straws. The crop vines had 48-134% higher spectral band intensity than the grain straws in spectral features associated with protein. Similar trends were also found in the bands that are mainly related to structural carbohydrates (such as cellulosic compounds). However, the field crop residues had higher peak intensity in total carbohydrates region than the crop vines. Furthermore, spectral ratios varied among the residue samples, indicating that these five crop residues had different internal structural conformation. However, multivariate spectral analyses showed that structural similarities still exhibited among crop residues in the regions associated with protein biopolymers and carbohydrate. Further study is needed to find out whether there is any relationship between spectroscopic information and nutrition supply in various kinds of crop residue when fed to animals. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

    PubMed Central

    2014-01-01

    Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231

  2. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines.

    PubMed

    Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin

    2014-04-28

    It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  3. Structural analysis of oligomeric and protofibrillar Aβ amyloid pair structures considering F20L mutation effects using molecular dynamics simulations.

    PubMed

    Lee, Myeongsang; Chang, Hyun Joon; Baek, Inchul; Na, Sungsoo

    2017-04-01

    Aβ amyloid proteins are involved in neuro-degenerative diseases such as Alzheimer's, Parkinson's, and so forth. Because of its structurally stable feature under physiological conditions, Aβ amyloid protein disrupts the normal cell function. Because of these concerns, understanding the structural feature of Aβ amyloid protein in detail is crucial. There have been some efforts on lowering the structural stabilities of Aβ amyloid fibrils by decreasing the aromatic residues characteristic and hydrophobic effect. Yet, there is a lack of understanding of Aβ amyloid pair structures considering those effects. In this study, we provide the structural characteristics of wildtype (WT) and phenylalanine residue mutation to leucine (F20L) Aβ amyloid pair structures using molecular dynamics simulation in detail. We also considered the polymorphic feature of F20L and WT Aβ pair amyloids based on the facing β-strand directions between the amyloid pairs. As a result, we were able to observe the varying effects of mutation, polymorphism, and protofibril lengths on the structural stability of pair amyloids. Furthermore, we have also found that opposite structural stability exists on a certain polymorphic Aβ pair amyloids depending on its oligomeric or protofibrillar state, which can be helpful for understanding the amyloid growth mechanism via repetitive fragmentation and elongation mechanism. Proteins 2017; 85:580-592. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Structures composing protein domains.

    PubMed

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel; Hudeček, Jiří

    2013-08-01

    This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  5. The Structures of Life

    ERIC Educational Resources Information Center

    National Institute of General Medical Sciences (NIGMS), 2007

    2007-01-01

    This booklet reveals how structural biology provides insight into health and disease and is useful in developing new medications. It contains a general introduction to proteins, coverage of the techniques used to determine protein structures, and a chapter on structure-based drug design. The booklet features "Student Snapshots," designed to…

  6. Protein structure based prediction of catalytic residues

    PubMed Central

    2013-01-01

    Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045

  7. Unexpected features of the dark proteome.

    PubMed

    Perdigão, Nelson; Heinrich, Julian; Stolte, Christian; Sabir, Kenneth S; Buckley, Michael J; Tabor, Bruce; Signal, Beth; Gloss, Brian S; Hammang, Christopher J; Rost, Burkhard; Schafferhans, Andrea; O'Donoghue, Seán I

    2015-12-29

    We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.

  8. Surface Proteins of Gram-Positive Pathogens: Using Crystallography to Uncover Novel Features in Drug and Vaccine Candidates

    NASA Astrophysics Data System (ADS)

    Baker, Edward N.; Proft, Thomas; Kang, Haejoo

    Proteins displayed on the cell surfaces of pathogenic organisms are the front-line troops of bacterial attack, playing critical roles in colonization, infection and virulence. Although such proteins can often be recognized from genome sequence data, through characteristic sequence motifs, their functions are often unknown. One such group of surface proteins is attached to the cell surface of Gram-positive pathogens through the action of sortase enzymes. Some of these proteins are now known to form pili: long filamentous structures that mediate attachment to human cells. Crystallographic analyses of these and other cell surface proteins have uncovered novel features in their structure, assembly and stability, including the presence of inter- and intramolecular isopeptide crosslinks. This improved understanding of structures on the bacterial cell surface offers opportunities for the development of some new drug targets and for novel approaches to vaccine design.

  9. Unexpected features of the dark proteome

    PubMed Central

    Perdigão, Nelson; Heinrich, Julian; Stolte, Christian; Sabir, Kenneth S.; Buckley, Michael J.; Tabor, Bruce; Signal, Beth; Gloss, Brian S.; Hammang, Christopher J.; Rost, Burkhard; Schafferhans, Andrea

    2015-01-01

    We surveyed the “dark” proteome–that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44–54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology. PMID:26578815

  10. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .

  11. Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-02-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.

  12. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-05-21

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Complex Structure and Biochemical Characterization of the Staphylococcus aureus Cyclic Diadenylate Monophosphate (c-di-AMP)-binding Protein PstA, the Founding Member of a New Signal Transduction Protein Family*

    PubMed Central

    Campeotto, Ivan; Zhang, Yong; Mladenov, Miroslav G.; Freemont, Paul S.; Gründling, Angelika

    2015-01-01

    Signaling nucleotides are integral parts of signal transduction systems allowing bacteria to cope with and rapidly respond to changes in the environment. The Staphylococcus aureus PII-like signal transduction protein PstA was recently identified as a cyclic diadenylate monophosphate (c-di-AMP)-binding protein. Here, we present the crystal structures of the apo- and c-di-AMP-bound PstA protein, which is trimeric in solution as well as in the crystals. The structures combined with detailed bioinformatics analysis revealed that the protein belongs to a new family of proteins with a similar core fold but with distinct features to classical PII proteins, which usually function in nitrogen metabolism pathways in bacteria. The complex structure revealed three identical c-di-AMP-binding sites per trimer with each binding site at a monomer-monomer interface. Although distinctly different from other cyclic-di-nucleotide-binding sites, as the half-binding sites are not symmetrical, the complex structure also highlighted common features for c-di-AMP-binding sites. A comparison between the apo and complex structures revealed a series of conformational changes that result in the ordering of two anti-parallel β-strands that protrude from each monomer and allowed us to propose a mechanism on how the PstA protein functions as a signaling transduction protein. PMID:25505271

  14. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study.

    PubMed

    A Santos, Jose C; Nassif, Houssam; Page, David; Muggleton, Stephen H; E Sternberg, Michael J

    2012-07-11

    There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.

  15. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  16. Modeling repetitive, non‐globular proteins

    PubMed Central

    Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun

    2016-01-01

    Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323

  17. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.

    PubMed

    Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing

    2016-08-24

    Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.

  18. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

    PubMed

    Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

    2017-09-22

    DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  19. Electrophoretic mobility shift in native gels indicates calcium-dependent structural changes of neuronal calcium sensor proteins.

    PubMed

    Viviano, Jeffrey; Krishnan, Anuradha; Wu, Hao; Venkataraman, Venkat

    2016-02-01

    In proteins of the neuronal calcium sensor (NCS) family, changes in structure as well as function are brought about by the binding of calcium. In this article, we demonstrate that these structural changes, solely due to calcium binding, can be assessed through electrophoresis in native gels. The results demonstrate that the NCS proteins undergo ligand-dependent conformational changes that are detectable in native gels as a gradual decrease in mobility with increasing calcium but not other tested divalent cations such as magnesium, strontium, and barium. Surprisingly, such a gradual change over the entire tested range is exhibited only by the NCS proteins but not by other tested calcium-binding proteins such as calmodulin and S100B, indicating that the change in mobility may be linked to a unique NCS family feature--the calcium-myristoyl switch. Even within the NCS family, the changes in mobility are characteristic of the protein, indicating that the technique is sensitive to the individual features of the protein. Thus, electrophoretic mobility on native gels provides a simple and elegant method to investigate calcium (small ligand)-induced structural changes at least in the superfamily of NCS proteins. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. PDB2CD: a web-based application for the generation of circular dichroism spectra from protein atomic coordinates.

    PubMed

    Mavridis, Lazaros; Janes, Robert W

    2017-01-01

    Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  1. iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features.

    PubMed

    Shatabda, Swakkhar; Saha, Sanjay; Sharma, Alok; Dehzangi, Abdollah

    2017-12-21

    Bacteriophage proteins are viruses that can significantly impact on the functioning of bacteria and can be used in phage based therapy. The functioning of Bacteriophage in the host bacteria depends on its location in those host cells. It is very important to know the subcellular location of the phage proteins in a host cell in order to understand their working mechanism. In this paper, we propose iPHLoc-ES, a prediction method for subcellular localization of bacteriophage proteins. We aim to solve two problems: discriminating between host located and non-host located phage proteins and discriminating between the locations of host located protein in a host cell (membrane or cytoplasm). To do this, we extract sets of evolutionary and structural features of phage protein and employ Support Vector Machine (SVM) as our classifier. We also use recursive feature elimination (RFE) to reduce the number of features for effective prediction. On standard dataset using standard evaluation criteria, our method significantly outperforms the state-of-the-art predictor. iPHLoc-ES is readily available to use as a standalone tool from: https://github.com/swakkhar/iPHLoc-ES/ and as a web application from: http://brl.uiu.ac.bd/iPHLoc-ES/. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    PubMed

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.

  3. PDBsum new things.

    PubMed

    Laskowski, Roman A

    2009-01-01

    PDBsum (http://www.ebi.ac.uk/pdbsum) provides summary information about each experimentally determined structural model in the Protein Data Bank (PDB). Here we describe some of its most recent features, including figures from the structure's key reference, citation data, Pfam domain diagrams, topology diagrams and protein-protein interactions. Furthermore, it now accepts users' own PDB format files and generates a private set of analyses for each uploaded structure.

  4. WONKA: objective novel complex analysis for ensembles of protein-ligand structures.

    PubMed

    Bradley, A R; Wall, I D; von Delft, F; Green, D V S; Deane, C M; Marsden, B D

    2015-10-01

    WONKA is a tool for the systematic analysis of an ensemble of protein-ligand structures. It makes the identification of conserved and unusual features within such an ensemble straightforward. WONKA uses an intuitive workflow to process structural co-ordinates. Ligand and protein features are summarised and then presented within an interactive web application. WONKA's power in consolidating and summarising large amounts of data is described through the analysis of three bromodomain datasets. Furthermore, and in contrast to many current methods, WONKA relates analysis to individual ligands, from which we find unusual and erroneous binding modes. Finally the use of WONKA as an annotation tool to share observations about structures is demonstrated. WONKA is freely available to download and install locally or can be used online at http://wonka.sgc.ox.ac.uk.

  5. Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

    PubMed Central

    Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue

    2011-01-01

    Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779

  6. A novel structural tree for wrap-proteins, a subclass of (α+β)-proteins.

    PubMed

    Boshkova, Eugenia A; Gordeev, Alexey B; Efimov, Alexander V

    2014-01-01

    In this paper, a novel structural subclass of (α+β)-proteins is presented. A characteristic feature of these proteins and domains is that they consist of strongly twisted and coiled β-sheets wrapped around one or two α-helices, so they are referred to here as wrap-proteins. It is shown that overall folds of the wrap-proteins can be obtained by stepwise addition of α-helices and/or β-strands to the strongly twisted and coiled β-hairpin taken as the starting structure in modeling. As a result of modeling, a structural tree for the wrap-proteins was constructed that includes 201 folds of which 49 occur in known nonhomologous proteins.

  7. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

  8. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  9. Molecular Dynamics Simulations and Structural Analysis of Giardia duodenalis 14-3-3 Protein-Protein Interactions.

    PubMed

    Cau, Ylenia; Fiorillo, Annarita; Mori, Mattia; Ilari, Andrea; Botta, Maurizo; Lalle, Marco

    2015-12-28

    Giardiasis is a gastrointestinal diarrheal illness caused by the protozoan parasite Giardia duodenalis, which affects annually over 200 million people worldwide. The limited antigiardial drug arsenal and the emergence of clinical cases refractory to standard treatments dictate the need for new chemotherapeutics. The 14-3-3 family of regulatory proteins, extensively involved in protein-protein interactions (PPIs) with pSer/pThr clients, represents a highly promising target. Despite homology with human counterparts, the single 14-3-3 of G. duodenalis (g14-3-3) is characterized by a constitutive phosphorylation in a region critical for target binding, thus affecting the function and the conformation of g14-3-3/clients interaction. However, to approach the design of specific small molecule modulators of g14-3-3 PPIs, structural elucidations are required. Here, we present a detailed computational and crystallographic study exploring the implications of g14-3-3 phosphorylation on protein structure and target binding. Self-Guided Langevin Dynamics and classical molecular dynamics simulations show that phosphorylation affects locally and globally g14-3-3 conformation, inducing a structural rearrangement more suitable for target binding. Profitable features for g14-3-3/clients interaction were highlighted using a hydrophobicity-based descriptor to characterize g14-3-3 client peptides. Finally, the X-ray structure of g14-3-3 in complex with a mode-1 prototype phosphopeptide was solved and combined with structure-based simulations to identify molecular features relevant for clients binding to g14-3-3. The data presented herein provide a further and structural understanding of g14-3-3 features and set the basis for drug design studies.

  10. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study

    PubMed Central

    2012-01-01

    Background There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. Results The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. Conclusions In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners. PMID:22783946

  11. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  12. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  13. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.

    PubMed

    Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

    2013-01-01

    Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.

  14. Computational and Statistical Analyses of Amino Acid Usage and Physico-Chemical Properties of the Twelve Late Embryogenesis Abundant Protein Classes

    PubMed Central

    Jaspard, Emmanuel; Macherel, David; Hunault, Gilles

    2012-01-01

    Late Embryogenesis Abundant Proteins (LEAPs) are ubiquitous proteins expected to play major roles in desiccation tolerance. Little is known about their structure - function relationships because of the scarcity of 3-D structures for LEAPs. The previous building of LEAPdb, a database dedicated to LEAPs from plants and other organisms, led to the classification of 710 LEAPs into 12 non-overlapping classes with distinct properties. Using this resource, numerous physico-chemical properties of LEAPs and amino acid usage by LEAPs have been computed and statistically analyzed, revealing distinctive features for each class. This unprecedented analysis allowed a rigorous characterization of the 12 LEAP classes, which differed also in multiple structural and physico-chemical features. Although most LEAPs can be predicted as intrinsically disordered proteins, the analysis indicates that LEAP class 7 (PF03168) and probably LEAP class 11 (PF04927) are natively folded proteins. This study thus provides a detailed description of the structural properties of this protein family opening the path toward further LEAP structure - function analysis. Finally, since each LEAP class can be clearly characterized by a unique set of physico-chemical properties, this will allow development of software to predict proteins as LEAPs. PMID:22615859

  15. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  16. A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach

    NASA Astrophysics Data System (ADS)

    Wang, Yu; Guo, Yanzhi; Kuang, Qifan; Pu, Xuemei; Ji, Yue; Zhang, Zhihang; Li, Menglong

    2015-04-01

    The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients ( R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.

  17. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  18. X-Ray Crystal Structure of the passenger domain of Plasmid encoded toxin(Pet), an Autotransporter Enterotoxin from enteroaggregative Escherichia coli (EAEC)

    PubMed Central

    Meza-Aguilar, J. Domingo; Fromme, Petra; Torres-Larios, Alfredo; Mendoza-Hernández, Guillermo; Hernandez-Chiñas, Ulises; Monteros, Roberto A. Arreguin-Espinosa de los; Campos, Carlos A. Eslava; Fromme, Raimund

    2014-01-01

    Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause of acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50 % compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181-190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135-143 compared to the structure of EspP. PMID:24530907

  19. Automated prediction of protein function and detection of functional sites from structure.

    PubMed

    Pazos, Florencio; Sternberg, Michael J E

    2004-10-12

    Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.

  20. Structure-guided wavelength tuning in far-red fluorescent proteins

    PubMed Central

    Ng, Ho-Leung; Lin, Michael Z.

    2017-01-01

    In recent years, protein engineers have succeeded in tuning the excitation spectra of natural fluorescent proteins from green wavelengths into orange and red wavelengths, resulting in the creation of a series of fluorescent proteins with emission in the far-red portions of the optical spectrum. These results have arisen from the synergistic combination of structural knowledge of fluorescent proteins, chemical intuition, and high-throughput screening methods. Here we review structural features found in autocatalytic far-red fluorescent proteins, and discuss how they add to our understanding of the biophysical mechanisms of wavelength tuning in biological chromophores. PMID:27468111

  1. Candida albicans Agglutinin-Like Sequence (Als) Family Vignettes: A Review of Als Protein Structure and Function

    PubMed Central

    Hoyer, Lois L.; Cota, Ernesto

    2016-01-01

    Approximately two decades have passed since the description of the first gene in the Candida albicans ALS (agglutinin-like sequence) family. Since that time, much has been learned about the composition of the family and the function of its encoded cell-surface glycoproteins. Solution of the structure of the Als adhesive domain provides the opportunity to evaluate the molecular basis for protein function. This review article is formatted as a series of fundamental questions and explores the diversity of the Als proteins, as well as their role in ligand binding, aggregative effects, and attachment to abiotic surfaces. Interaction of Als proteins with each other, their functional equivalence, and the effects of protein abundance on phenotypic conclusions are also examined. Structural features of Als proteins that may facilitate invasive function are considered. Conclusions that are firmly supported by the literature are presented while highlighting areas that require additional investigation to reveal basic features of the Als proteins, their relatedness to each other, and their roles in C. albicans biology. PMID:27014205

  2. Polymorphism of Lysozyme Condensates.

    PubMed

    Safari, Mohammad S; Byington, Michael C; Conrad, Jacinta C; Vekilov, Peter G

    2017-10-05

    Protein condensates play essential roles in physiological processes and pathological conditions. Recently discovered mesoscopic protein-rich clusters may act as crucial precursors for the nucleation of ordered protein solids, such as crystals, sickle hemoglobin polymers, and amyloid fibrils. These clusters challenge settled paradigms of protein condensation as the constituent protein molecules present features characteristic of both partially misfolded and native proteins. Here we employ the antimicrobial enzyme lysozyme and examine the similarities between mesoscopic clusters, amyloid structures, and disordered aggregates consisting of chemically modified protein. We show that the mesoscopic clusters are distinct from the other two classes of aggregates. Whereas cluster formation and amyloid oligomerization are both reversible, aggregation triggered by reduction of the intramolecular S-S bonds is permanent. In contrast to the amyloid structures, protein molecules in the clusters retain their enzymatic activity. Furthermore, an essential feature of the mesoscopic clusters is their constant radius of less than 50 nm. The amyloid and disordered aggregates are significantly larger and rapidly grow. These findings demonstrate that the clusters are a product of limited protein structural flexibility. In view of the role of the clusters in the nucleation of ordered protein solids, our results suggest that fine-tuning the degree of protein conformational stability is a powerful tool to control and direct the pathways of protein condensation.

  3. Predicting protein complexes using a supervised learning method combined with local structural information.

    PubMed

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  4. Complete fold annotation of the human proteome using a novel structural feature space.

    PubMed

    Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  5. Complete fold annotation of the human proteome using a novel structural feature space

    PubMed Central

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    2017-01-01

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174

  6. Design of structurally distinct proteins using strategies inspired by evolution

    DOE PAGES

    Jacobs, T. M.; Williams, B.; Williams, T.; ...

    2016-05-06

    Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. In this paper, we describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. Finally, thismore » method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.« less

  7. Molecular chaperones: functional mechanisms and nanotechnological applications

    NASA Astrophysics Data System (ADS)

    Rosario Fernández-Fernández, M.; Sot, Begoña; María Valpuesta, José

    2016-08-01

    Molecular chaperones are a group of proteins that assist in protein homeostasis. They not only prevent protein misfolding and aggregation, but also target misfolded proteins for degradation. Despite differences in structure, all types of chaperones share a common general feature, a surface that recognizes and interacts with the misfolded protein. This and other, more specialized properties can be adapted for various nanotechnological purposes, by modification of the original biomolecules or by de novo design based on artificial structures.

  8. Extant fold-switching proteins are widespread.

    PubMed

    Porter, Lauren L; Looger, Loren L

    2018-06-05

    A central tenet of biology is that globular proteins have a unique 3D structure under physiological conditions. Recent work has challenged this notion by demonstrating that some proteins switch folds, a process that involves remodeling of secondary structure in response to a few mutations (evolved fold switchers) or cellular stimuli (extant fold switchers). To date, extant fold switchers have been viewed as rare byproducts of evolution, but their frequency has been neither quantified nor estimated. By systematically and exhaustively searching the Protein Data Bank (PDB), we found ∼100 extant fold-switching proteins. Furthermore, we gathered multiple lines of evidence suggesting that these proteins are widespread in nature. Based on these lines of evidence, we hypothesized that the frequency of extant fold-switching proteins may be underrepresented by the structures in the PDB. Thus, we sought to identify other putative extant fold switchers with only one solved conformation. To do this, we identified two characteristic features of our ∼100 extant fold-switching proteins, incorrect secondary structure predictions and likely independent folding cooperativity, and searched the PDB for other proteins with similar features. Reassuringly, this method identified dozens of other proteins in the literature with indication of a structural change but only one solved conformation in the PDB. Thus, we used it to estimate that 0.5-4% of PDB proteins switch folds. These results demonstrate that extant fold-switching proteins are likely more common than the PDB reflects, which has implications for cell biology, genomics, and human health. Copyright © 2018 the Author(s). Published by PNAS.

  9. A deep learning framework for modeling structural features of RNA-binding protein targets

    PubMed Central

    Zhang, Sai; Zhou, Jingtian; Hu, Hailin; Gong, Haipeng; Chen, Ligong; Cheng, Chao; Zeng, Jianyang

    2016-01-01

    RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp. PMID:26467480

  10. Differential role of molten globule and protein folding in distinguishing unique features of botulinum neurotoxin.

    PubMed

    Kumar, Raj; Kukreja, Roshan V; Cai, Shuowei; Singh, Bal R

    2014-06-01

    Botulinum neurotoxins (BoNTs) are proteins of great interest not only because of their extreme toxicity but also paradoxically for their therapeutic applications. All the known serotypes (A-G) have varying degrees of longevity and potency inside the neuronal cell. Differential chemical modifications such as phosphorylation and ubiquitination have been suggested as possible mechanisms for their longevity, but the molecular basis of the longevity remains unclear. Since the endopeptidase domain (light chain; LC) of toxin apparently survives inside the neuronal cells for months, it is important to examine the structural features of this domain to understand its resistance to intracellular degradation. Published crystal structures (both botulinum neurotoxins and endopeptidase domain) have not provided adequate explanation for the intracellular longevity of the domain. Structural features obtained from spectroscopic analysis of LCA and LCB were similar, and a PRIME (PReImminent Molten Globule Enzyme) conformation appears to be responsible for their optimal enzymatic activity at 37°C. LCE, on the other hand, was although optimally active at 37°C, but its active conformation differed from the PRIME conformation of LCA and LCB. This study establishes and confirms our earlier finding that an optimally active conformation of these proteins in the form of PRIME exists for the most poisonous poison, botulinum neurotoxin. There are substantial variations in the structural and functional characteristics of these active molten globule related structures among the three BoNT endopeptidases examined. These differential conformations of LCs are important in understanding the fundamental structural features of proteins, and their possible connection to intracellular longevity could provide significant clues for devising new countermeasures and effective therapeutics. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. X-ray crystal structure of the passenger domain of plasmid encoded toxin(Pet), an autotransporter enterotoxin from enteroaggregative Escherichia coli (EAEC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Domingo Meza-Aguilar, J.; Laboratorio de Patogenicidad Bacteriana, Unidad de Hemato Oncología e Investigación, Hospital Infantil de México Federico Gómez 06720, D.F.; Fromme, Petra

    Highlights: • X-ray crystal structure of the passenger domain of Plasmid encoded toxin at 2.3 Å. • Structural differences between Pet passenger domain and EspP protein are described. • High flexibility of the C-terminal beta helix is structurally assigned. - Abstract: Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause ofmore » acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50% compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181–190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135 and 143 compared to the structure of EspP.« less

  12. 3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.

    PubMed

    Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke

    2014-01-01

    The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.

  13. GALT protein database: querying structural and functional features of GALT enzyme.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2014-09-01

    Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.

  14. mpMoRFsDB: a database of molecular recognition features in membrane proteins.

    PubMed

    Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J

    2013-10-01

    Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB

  15. Conformational diversity analysis reveals three functional mechanisms in proteins

    PubMed Central

    Fornasari, María Silvina

    2017-01-01

    Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of ~5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call “rigid” (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions. PMID:28192432

  16. Determining crystal structures through crowdsourcing and coursework

    NASA Astrophysics Data System (ADS)

    Horowitz, Scott; Koepnick, Brian; Martin, Raoul; Tymieniecki, Agnes; Winburn, Amanda A.; Cooper, Seth; Flatten, Jeff; Rogawski, David S.; Koropatkin, Nicole M.; Hailu, Tsinatkeab T.; Jain, Neha; Koldewey, Philipp; Ahlstrom, Logan S.; Chapman, Matthew R.; Sikkema, Andrew P.; Skiba, Meredith A.; Maloney, Finn P.; Beinlich, Felix R. M.; Caglar, Ahmet; Coral, Alan; Jensen, Alice Elizabeth; Lubow, Allen; Boitano, Amanda; Lisle, Amy Elizabeth; Maxwell, Andrew T.; Failer, Barb; Kaszubowski, Bartosz; Hrytsiv, Bohdan; Vincenzo, Brancaccio; de Melo Cruz, Breno Renan; McManus, Brian Joseph; Kestemont, Bruno; Vardeman, Carl; Comisky, Casey; Neilson, Catherine; Landers, Catherine R.; Ince, Christopher; Buske, Daniel Jon; Totonjian, Daniel; Copeland, David Marshall; Murray, David; Jagieła, Dawid; Janz, Dietmar; Wheeler, Douglas C.; Cali, Elie; Croze, Emmanuel; Rezae, Farah; Martin, Floyd Orville; Beecher, Gil; de Jong, Guido Alexander; Ykman, Guy; Feldmann, Harald; Chan, Hugo Paul Perez; Kovanecz, Istvan; Vasilchenko, Ivan; Connellan, James C.; Borman, Jami Lynne; Norrgard, Jane; Kanfer, Jebbie; Canfield, Jeffrey M.; Slone, Jesse David; Oh, Jimmy; Mitchell, Joanne; Bishop, John; Kroeger, John Douglas; Schinkler, Jonas; McLaughlin, Joseph; Brownlee, June M.; Bell, Justin; Fellbaum, Karl Willem; Harper, Kathleen; Abbey, Kirk J.; Isaksson, Lennart E.; Wei, Linda; Cummins, Lisa N.; Miller, Lori Anne; Bain, Lyn; Carpenter, Lynn; Desnouck, Maarten; Sharma, Manasa G.; Belcastro, Marcus; Szew, Martin; Szew, Martin; Britton, Matthew; Gaebel, Matthias; Power, Max; Cassidy, Michael; Pfützenreuter, Michael; Minett, Michele; Wesselingh, Michiel; Yi, Minjune; Cameron, Neil Haydn Tormey; Bolibruch, Nicholas I.; Benevides, Noah; Kathleen Kerr, Norah; Barlow, Nova; Crevits, Nykole Krystyne; Dunn, Paul; Silveira Belo Nascimento Roque, Paulo Sergio; Riber, Peter; Pikkanen, Petri; Shehzad, Raafay; Viosca, Randy; James Fraser, Robert; Leduc, Robert; Madala, Roman; Shnider, Scott; de Boisblanc, Sharon; Butkovich, Slava; Bliven, Spencer; Hettler, Stephen; Telehany, Stephen; Schwegmann, Steven A.; Parkes, Steven; Kleinfelter, Susan C.; Michael Holst, Sven; van der Laan, T. J. A.; Bausewein, Thomas; Simon, Vera; Pulley, Warwick; Hull, William; Kim, Annes Yukyung; Lawton, Alexis; Ruesch, Amanda; Sundar, Anjali; Lawrence, Anna-Lisa; Afrin, Antara; Maheshwer, Bhargavi; Turfe, Bilal; Huebner, Christian; Killeen, Courtney Elizabeth; Antebi-Lerrman, Dalia; Luan, Danny; Wolfe, Derek; Pham, Duc; Michewicz, Elaina; Hull, Elizabeth; Pardington, Emily; Galal, Galal Osama; Sun, Grace; Chen, Grace; Anderson, Halie E.; Chang, Jane; Hewlett, Jeffrey Thomas; Sterbenz, Jennifer; Lim, Jiho; Morof, Joshua; Lee, Junho; Inn, Juyoung Samuel; Hahm, Kaitlin; Roth, Kaitlin; Nair, Karun; Markin, Katherine; Schramm, Katie; Toni Eid, Kevin; Gam, Kristina; Murphy, Lisha; Yuan, Lucy; Kana, Lulia; Daboul, Lynn; Shammas, Mario Karam; Chason, Max; Sinan, Moaz; Andrew Tooley, Nicholas; Korakavi, Nisha; Comer, Patrick; Magur, Pragya; Savliwala, Quresh; Davison, Reid Michael; Sankaran, Roshun Rajiv; Lewe, Sam; Tamkus, Saule; Chen, Shirley; Harvey, Sho; Hwang, Sin Ye; Vatsia, Sohrab; Withrow, Stefan; Luther, Tahra K.; Manett, Taylor; Johnson, Thomas James; Ryan Brash, Timothy; Kuhlman, Wyatt; Park, Yeonjung; Popović, Zoran; Baker, David; Khatib, Firas; Bardwell, James C. A.

    2016-09-01

    We show here that computer game players can build high-quality crystal structures. Introduction of a new feature into the computer game Foldit allows players to build and real-space refine structures into electron density maps. To assess the usefulness of this feature, we held a crystallographic model-building competition between trained crystallographers, undergraduate students, Foldit players and automatic model-building algorithms. After removal of disordered residues, a team of Foldit players achieved the most accurate structure. Analysing the target protein of the competition, YPL067C, uncovered a new family of histidine triad proteins apparently involved in the prevention of amyloid toxicity. From this study, we conclude that crystallographers can utilize crowdsourcing to interpret electron density information and to produce structure solutions of the highest quality.

  17. Determining crystal structures through crowdsourcing and coursework.

    PubMed

    Horowitz, Scott; Koepnick, Brian; Martin, Raoul; Tymieniecki, Agnes; Winburn, Amanda A; Cooper, Seth; Flatten, Jeff; Rogawski, David S; Koropatkin, Nicole M; Hailu, Tsinatkeab T; Jain, Neha; Koldewey, Philipp; Ahlstrom, Logan S; Chapman, Matthew R; Sikkema, Andrew P; Skiba, Meredith A; Maloney, Finn P; Beinlich, Felix R M; Popović, Zoran; Baker, David; Khatib, Firas; Bardwell, James C A

    2016-09-16

    We show here that computer game players can build high-quality crystal structures. Introduction of a new feature into the computer game Foldit allows players to build and real-space refine structures into electron density maps. To assess the usefulness of this feature, we held a crystallographic model-building competition between trained crystallographers, undergraduate students, Foldit players and automatic model-building algorithms. After removal of disordered residues, a team of Foldit players achieved the most accurate structure. Analysing the target protein of the competition, YPL067C, uncovered a new family of histidine triad proteins apparently involved in the prevention of amyloid toxicity. From this study, we conclude that crystallographers can utilize crowdsourcing to interpret electron density information and to produce structure solutions of the highest quality.

  18. Crystal structures of native and xylosaccharide-bound alkali thermostable xylanase from an alkalophilic Bacillus sp. NG-27: Structural insights into alkalophilicity and implications for adaptation to polyextreme conditions

    PubMed Central

    Manikandan, Karuppasamy; Bhardwaj, Amit; Gupta, Naveen; Lokanath, Neratur K.; Ghosh, Amit; Reddy, Vanga Siva; Ramakumar, Suryanarayanarao

    2006-01-01

    Crystal structures are known for several glycosyl hydrolase family 10 (GH10) xylanases. However, none of them is from an alkalophilic organism that can grow in alkaline conditions. We have determined the crystal structures at 2.2 Å of a GH10 extracellular endoxylanase (BSX) from an alkalophilic Bacillus sp. NG-27, for the native and the complex enzyme with xylosaccharides. The industrially important enzyme is optimally active and stable at 343 K and at a pH of 8.4. Comparison of the structure of BSX with those of other thermostable GH10 xylanases optimally active at acidic or close to neutral pH showed that the solvent-exposed acidic amino acids, Asp and Glu, are markedly enhanced in BSX, while solvent-exposed Asn was noticeably depleted. The BSX crystal structure when compared with putative three-dimensional homology models of other extracellular alkalophilic GH10 xylanases from alkalophilic organisms suggests that a protein surface rich in acidic residues may be an important feature common to these alkali thermostable enzymes. A comparison of the surface features of BSX and of halophilic proteins allowed us to predict the activity of BSX at high salt concentrations, which we verified through experiments. This offered us important lessons in the polyextremophilicity of proteins, where understanding the structural features of a protein stable in one set of extreme conditions provided clues about the activity of the protein in other extreme conditions. The work brings to the fore the role of the nature and composition of solvent-exposed residues in the adaptation of enzymes to polyextreme conditions, as in BSX. PMID:16823036

  19. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  20. Structural basis of viral invasion: lessons from paramyxovirus F

    PubMed Central

    Lamb, Robert A.; Jardetzky, Theodore S.

    2007-01-01

    Summary The structures of glycoproteins that mediate enveloped virus entry into cells have revealed dramatic structural changes that accompany membrane fusion and provided mechanistic insights into this process. The group of class I viral fusion proteins includes the influenza hemagglutinin, paramyxovirus F, HIV env and other mechanistically related fusogens, but these proteins are unrelated in sequence and exhibit clearly distinct structural features. Recently determined crystal structures of the paramyxovirus F protein in two conformations, representing prefusion and postfusion states, reveal a novel protein architecture that undergoes large-scale, irreversible refolding during membrane fusion, extending our understanding of this diverse group of membrane fusion machines. PMID:17870467

  1. Structural organization of G-protein-coupled receptors

    NASA Astrophysics Data System (ADS)

    Lomize, Andrei L.; Pogozheva, Irina D.; Mosberg, Henry I.

    1999-07-01

    Atomic-resolution structures of the transmembrane 7-α-helical domains of 26 G-protein-coupled receptors (GPCRs) (including opsins, cationic amine, melatonin, purine, chemokine, opioid, and glycoprotein hormone receptors and two related proteins, retinochrome and Duffy erythrocyte antigen) were calculated by distance geometry using interhelical hydrogen bonds formed by various proteins from the family and collectively applied as distance constraints, as described previously [Pogozheva et al., Biophys. J., 70 (1997) 1963]. The main structural features of the calculated GPCR models are described and illustrated by examples. Some of the features reflect physical interactions that are responsible for the structural stability of the transmembrane α-bundle: the formation of extensive networks of interhelical H-bonds and sulfur-aromatic clusters that are spatially organized as 'polarity gradients' the close packing of side-chains throughout the transmembrane domain; and the formation of interhelical disulfide bonds in some receptors and a plausible Zn2+ binding center in retinochrome. Other features of the models are related to biological function and evolution of GPCRs: the formation of a common 'minicore' of 43 evolutionarily conserved residues; a multitude of correlated replacements throughout the transmembrane domain; an Na+-binding site in some receptors, and excellent complementarity of receptor binding pockets to many structurally dissimilar, conformationally constrained ligands, such as retinal, cyclic opioid peptides, and cationic amine ligands. The calculated models are in good agreement with numerous experimental data.

  2. Thermodynamic database for proteins: features and applications.

    PubMed

    Gromiha, M Michael; Sarai, Akinori

    2010-01-01

    We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.

  3. Protein control of true, gated, and coupled electron transfer reactions.

    PubMed

    Davidson, Victor L

    2008-06-01

    Electron transfer (ET) through and between proteins is a fundamental biological process. The rates of ET depend upon the thermodynamic driving force, the reorganization energy, and the degree of electronic coupling between the reactant and product states. The analysis of protein ET reactions is complicated by the fact that non-ET processes might influence the observed ET rate in kinetically complex biological systems. This Account describes studies of the methylamine dehydrogenase-amicyanin-cytochrome c-551i protein ET complex that have revealed the influence of several features of the protein structure on the magnitudes of the physical parameters for true ET reactions and how they dictate the kinetic mechanisms of non-ET processes that sometimes influence protein ET reactions. Kinetic and thermodynamic studies, coupled with structural information and biochemical data, are necessary to fully describe the ET reactions of proteins. Site-directed mutagenesis can be used to elucidate specific structure-function relationships. When mutations selectively alter the electronic coupling, reorganization energy, or driving force for the ET reaction, it becomes possible to use the parameters of the ET process to determine how specific amino acid residues and other features of the protein structure influence the ET rates. When mutations alter the kinetic mechanism for ET, one can determine the mechanisms by which non-ET processes, such as protein conformational changes or proton transfers, control the rates of ET reactions and how specific amino acid residues and certain features of the protein structure influence these non-ET reactions. A complete description of the mechanism of regulation of biological ET reactions enhances our understanding of metabolism, respiration, and photosynthesis at the molecular level. Such information has important medical relevance. Defective protein ET leads to production of the reactive oxygen species and free radicals that are associated with aging and many disease states. Defective ET within the respiratory chain also causes certain mitochondrial myopathies. An understanding of the mechanisms of regulation of protein ET is also of practical value because it provides a logical basis for the design of applications utilizing redox enzymes, such as enzyme-based electrode sensors and fuel cells.

  4. GFP-like proteins as ubiquitous metazoan superfamily: evolution of functional features and structural complexity.

    PubMed

    Shagin, Dmitry A; Barsova, Ekaterina V; Yanushevich, Yurii G; Fradkov, Arkady F; Lukyanov, Konstantin A; Labas, Yulii A; Semenova, Tatiana N; Ugalde, Juan A; Meyers, Ann; Nunez, Jose M; Widder, Edith A; Lukyanov, Sergey A; Matz, Mikhail V

    2004-05-01

    Homologs of the green fluorescent protein (GFP), including the recently described GFP-like domains of certain extracellular matrix proteins in Bilaterian organisms, are remarkably similar at the protein structure level, yet they often perform totally unrelated functions, thereby warranting recognition as a superfamily. Here we describe diverse GFP-like proteins from previously undersampled and completely new sources, including hydromedusae and planktonic Copepoda. In hydromedusae, yellow and nonfluorescent purple proteins were found in addition to greens. Notably, the new yellow protein seems to follow exactly the same structural solution to achieving the yellow color of fluorescence as YFP, an engineered yellow-emitting mutant variant of GFP. The addition of these new sequences made it possible to resolve deep-level phylogenetic relationships within the superfamily. Fluorescence (most likely green) must have already existed in the common ancestor of Cnidaria and Bilateria, and therefore GFP-like proteins may be responsible for fluorescence and/or coloration in virtually any animal. At least 15 color diversification events can be inferred following the maximum parsimony principle in Cnidaria. Origination of red fluorescence and nonfluorescent purple-blue colors on several independent occasions provides a remarkable example of convergent evolution of complex features at the molecular level.

  5. Physics of protein folding

    NASA Astrophysics Data System (ADS)

    Finkelstein, A. V.; Galzitskaya, O. V.

    2004-04-01

    Protein physics is grounded on three fundamental experimental facts: protein, this long heteropolymer, has a well defined compact three-dimensional structure; this structure can spontaneously arise from the unfolded protein chain in appropriate environment; and this structure is separated from the unfolded state of the chain by the “all-or-none” phase transition, which ensures robustness of protein structure and therefore of its action. The aim of this review is to consider modern understanding of physical principles of self-organization of protein structures and to overview such important features of this process, as finding out the unique protein structure among zillions alternatives, nucleation of the folding process and metastable folding intermediates. Towards this end we will consider the main experimental facts and simple, mostly phenomenological theoretical models. We will concentrate on relatively small (single-domain) water-soluble globular proteins (whose structure and especially folding are much better studied and understood than those of large or membrane and fibrous proteins) and consider kinetic and structural aspects of transition of initially unfolded protein chains into their final solid (“native”) 3D structures.

  6. Understanding and Manipulating Electrostatic Fields at the Protein-Protein Interface Using Vibrational Spectroscopy and Continuum Electrostatics Calculations.

    PubMed

    Ritchie, Andrew W; Webb, Lauren J

    2015-11-05

    Biological function emerges in large part from the interactions of biomacromolecules in the complex and dynamic environment of the living cell. For this reason, macromolecular interactions in biological systems are now a major focus of interest throughout the biochemical and biophysical communities. The affinity and specificity of macromolecular interactions are the result of both structural and electrostatic factors. Significant advances have been made in characterizing structural features of stable protein-protein interfaces through the techniques of modern structural biology, but much less is understood about how electrostatic factors promote and stabilize specific functional macromolecular interactions over all possible choices presented to a given molecule in a crowded environment. In this Feature Article, we describe how vibrational Stark effect (VSE) spectroscopy is being applied to measure electrostatic fields at protein-protein interfaces, focusing on measurements of guanosine triphosphate (GTP)-binding proteins of the Ras superfamily binding with structurally related but functionally distinct downstream effector proteins. In VSE spectroscopy, spectral shifts of a probe oscillator's energy are related directly to that probe's local electrostatic environment. By performing this experiment repeatedly throughout a protein-protein interface, an experimental map of measured electrostatic fields generated at that interface is determined. These data can be used to rationalize selective binding of similarly structured proteins in both in vitro and in vivo environments. Furthermore, these data can be used to compare to computational predictions of electrostatic fields to explore the level of simulation detail that is necessary to accurately predict our experimental findings.

  7. The Conformational Stability and Biophysical Properties of the Eukaryotic Thioredoxins of Pisum Sativum Are Not Family-Conserved

    PubMed Central

    Aguado-Llera, David; Martínez-Gómez, Ana Isabel; Prieto, Jesús; Marenchino, Marco; Traverso, José Angel; Gómez, Javier; Chueca, Ana; Neira, José L.

    2011-01-01

    Thioredoxins (TRXs) are ubiquitous proteins involved in redox processes. About forty genes encode TRX or TRX-related proteins in plants, grouped in different families according to their subcellular localization. For instance, the h-type TRXs are located in cytoplasm or mitochondria, whereas f-type TRXs have a plastidial origin, although both types of proteins have an eukaryotic origin as opposed to other TRXs. Herein, we study the conformational and the biophysical features of TRXh1, TRXh2 and TRXf from Pisum sativum. The modelled structures of the three proteins show the well-known TRX fold. While sharing similar pH-denaturations features, the chemical and thermal stabilities are different, being PsTRXh1 (Pisum sativum thioredoxin h1) the most stable isoform; moreover, the three proteins follow a three-state denaturation model, during the chemical-denaturations. These differences in the thermal- and chemical-denaturations result from changes, in a broad sense, of the several ASAs (accessible surface areas) of the proteins. Thus, although a strong relationship can be found between the primary amino acid sequence and the structure among TRXs, that between the residue sequence and the conformational stability and biophysical properties is not. We discuss how these differences in the biophysical properties of TRXs determine their unique functions in pea, and we show how residues involved in the biophysical features described (pH-titrations, dimerizations and chemical-denaturations) belong to regions involved in interaction with other proteins. Our results suggest that the sequence demands of protein-protein function are relatively rigid, with different protein-binding pockets (some in common) for each of the three proteins, but the demands of structure and conformational stability per se (as long as there is a maintained core), are less so. PMID:21364950

  8. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  9. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528

  10. The calcium binding properties and structure prediction of the Hax-1 protein.

    PubMed

    Balcerak, Anna; Rowinski, Sebastian; Szafron, Lukasz M; Grzybowska, Ewa A

    2017-01-01

    Hax-1 is a protein involved in regulation of different cellular processes, but its properties and exact mechanisms of action remain unknown. In this work, using purified, recombinant Hax-1 and by applying an in vitro autoradiography assay we have shown that this protein binds Ca 2+ . Additionally, we performed structure prediction analysis which shows that Hax-1 displays definitive structural features, such as two α-helices, short β-strands and four disordered segments.

  11. Structural Biology of Non-Ribosomal Peptide Synthetases

    PubMed Central

    Miller, Bradley R.; Gulick, Andrew M.

    2016-01-01

    Summary The non-ribosomal peptide synthetases are modular enzymes that catalyze synthesis of important peptide products from a variety of standard and non-proteinogenic amino acid substrates. Within a single module are multiple catalytic domains that are responsible for incorporation of a single residue. After the amino acid is activated and covalently attached to an integrated carrier protein domain, the substrates and intermediates are delivered to neighboring catalytic domains for peptide bond formation or, in some modules, chemical modification. In the final module, the peptide is delivered to a terminal thioesterase domain that catalyzes release of the peptide product. This multi-domain modular architecture raises questions about the structural features that enable this assembly line synthesis in an efficient manner. The structures of the core component domains have been determined and demonstrate insights into the catalytic activity. More recently, multi-domain structures have been determined and are providing clues to the features of these enzyme systems that govern the functional interaction between multiple domains. This chapter describes the structures of NRPS proteins and the strategies that are being used to assist structural studies of these dynamic proteins, including careful consideration of domain boundaries for generation of truncated proteins and the use of mechanism-based inhibitors that trap interactions between the catalytic and carrier protein domains. PMID:26831698

  12. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation.

    PubMed

    Mourad, Raphaël; Cuvier, Olivier

    2016-05-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1.

  13. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation

    PubMed Central

    Mourad, Raphaël; Cuvier, Olivier

    2016-01-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1. PMID:27203237

  14. Template-based modeling and ab initio refinement of protein oligomer structures using GALAXY in CAPRI round 30.

    PubMed

    Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok

    2017-03-01

    Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  16. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  17. Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology

    PubMed Central

    Sinz, Andrea; Arlt, Christian; Chorev, Dror; Sharon, Michal

    2015-01-01

    Mass spectrometry (MS) is becoming increasingly popular in the field of structural biology for analyzing protein three-dimensional-structures and for mapping protein–protein interactions. In this review, the specific contributions of chemical crosslinking and native MS are outlined to reveal the structural features of proteins and protein assemblies. Both strategies are illustrated based on the examples of the tetrameric tumor suppressor protein p53 and multisubunit vinculin-Arp2/3 hybrid complexes. We describe the distinct advantages and limitations of each technique and highlight synergistic effects when both techniques are combined. Integrating both methods is especially useful for characterizing large protein assemblies and for capturing transient interactions. We also point out the future directions we foresee for a combination of in vivo crosslinking and native MS for structural investigation of intact protein assemblies. PMID:25970732

  18. Biological and functional relevance of CASP predictions.

    PubMed

    Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B

    2018-03-01

    Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.

  19. A practical teaching course in directed protein evolution using the green fluorescent protein as a model.

    PubMed

    Ruller, Roberto; Silva-Rocha, Rafael; Silva, Artur; Cruz Schneider, Maria Paula; Ward, Richard John

    2011-01-01

    Protein engineering is a powerful tool, which correlates protein structure with specific functions, both in applied biotechnology and in basic research. Here, we present a practical teaching course for engineering the green fluorescent protein (GFP) from Aequorea victoria by a random mutagenesis strategy using error-prone polymerase chain reaction. Screening of bacterial colonies transformed with random mutant libraries identified GFP variants with increased fluorescence yields. Mapping the three-dimensional structure of these mutants demonstrated how alterations in structural features such as the environment around the fluorophore and properties of the protein surface can influence functional properties such as the intensity of fluorescence and protein solubility. Copyright © 2011 Wiley Periodicals, Inc.

  20. Atomic interaction networks in the core of protein domains and their native folds.

    PubMed

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram

    2010-02-23

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.

  1. Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds

    PubMed Central

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram

    2010-01-01

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337

  2. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space.

    PubMed

    Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng

    2016-01-01

    To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more efficiently obtain the near-native protein structure.

  3. Molecular modeling of the human sperm associated antigen 11 B (SPAG11B) proteins.

    PubMed

    Narmadha, Ganapathy; Yenugu, Suresh

    2015-04-01

    Antimicrobial proteins and peptides are ubiquitous in nature with diverse structural and biological properties. Among them, the human beta-defensins are known to contribute to the innate immune response. Besides the defensins, a number of defensin-like proteins and peptides are expressed in many organ systems including the male reproductive system. Some of the protein isoforms encoded by the sperm associated antigen 11B (SPAG11) gene in humans are beta-defensin-like and exhibit structure dependent and salt tolerant antimicrobial activity, besides contributing to sperm maturation. Though some of the functional roles of these proteins are reported, the structural and molecular features that contribute to their antimicrobial activity is not yet reported. In this study, using in silico tools, we report the three dimensional structure of the human SPAG11B proteins and their C-terminal peptides. web-based hydropathy, amphipathicity, and topology (WHAT) analyses and grand average of hydropathy (GRAVY) indices show that these proteins and peptides are amphipathic and highly hydrophilic. Self-optimized prediction method with alignment (SOPMA) analyses and circular dichroism data suggest that the secondary structure of these proteins and peptides primarily contain beta-sheet and random coil structure and alpha-helix to a lesser extent. Ramachandran plots show that majority of the amino acids in these proteins and peptides fall in the permissible regions, thus indicating stable structures. The secondary structure of SPAG11B isoforms and their peptides were not perturbed with increasing NaCl concentration (0-300 mM) and at different pH (3, 7, and 10), thus reinforcing our previously reported observation that their antimicrobial activity is salt tolerant. To the best of our knowledge, for the first time, results of our study provide vital information on the structural features of SPAG11B protein isoforms and their contribution to antimicrobial activity.

  4. A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus.

    PubMed

    Ekins, Sean; Freundlich, Joel S; Coffee, Megan

    2014-01-01

    We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus in vitro and in vivo and we propose that this hypothesis could be readily tested.

  5. A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus

    PubMed Central

    Ekins, Sean; Freundlich, Joel S.; Coffee, Megan

    2014-01-01

    We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus in vitro and in vivo and we propose that this hypothesis could be readily tested. PMID:25653841

  6. Computational Simulation of the Activation Cycle of Gα Subunit in the G Protein Cycle Using an Elastic Network Model

    PubMed Central

    Kim, Min Hyeok; Kim, Young Jin; Kim, Hee Ryung; Jeon, Tae-Joon; Choi, Jae Boong; Chung, Ka Young; Kim, Moon Ki

    2016-01-01

    Agonist-activated G protein-coupled receptors (GPCRs) interact with GDP-bound G protein heterotrimers (Gαβγ) promoting GDP/GTP exchange, which results in dissociation of Gα from the receptor and Gβγ. The GTPase activity of Gα hydrolyzes GTP to GDP, and the GDP-bound Gα interacts with Gβγ, forming a GDP-bound G protein heterotrimer. The G protein cycle is allosterically modulated by conformational changes of the Gα subunit. Although biochemical and biophysical methods have elucidated the structure and dynamics of Gα, the precise conformational mechanisms underlying the G protein cycle are not fully understood yet. Simulation methods could help to provide additional details to gain further insight into G protein signal transduction mechanisms. In this study, using the available X-ray crystal structures of Gα, we simulated the entire G protein cycle and described not only the steric features of the Gα structure, but also conformational changes at each step. Each reference structure in the G protein cycle was modeled as an elastic network model and subjected to normal mode analysis. Our simulation data suggests that activated receptors trigger conformational changes of the Gα subunit that are thermodynamically favorable for opening of the nucleotide-binding pocket and GDP release. Furthermore, the effects of GTP binding and hydrolysis on mobility changes of the C and N termini and switch regions are elucidated. In summary, our simulation results enabled us to provide detailed descriptions of the structural and dynamic features of the G protein cycle. PMID:27483005

  7. MD simulations of papillomavirus DNA-E2 protein complexes hints at a protein structural code for DNA deformation.

    PubMed

    Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A

    2008-08-01

    The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.

  8. Ensemble pharmacophore meets ensemble docking: a novel screening strategy for the identification of RIPK1 inhibitors

    NASA Astrophysics Data System (ADS)

    Fayaz, S. M.; Rajanikant, G. K.

    2014-07-01

    Programmed cell death has been a fascinating area of research since it throws new challenges and questions in spite of the tremendous ongoing research in this field. Recently, necroptosis, a programmed form of necrotic cell death, has been implicated in many diseases including neurological disorders. Receptor interacting serine/threonine protein kinase 1 (RIPK1) is an important regulatory protein involved in the necroptosis and inhibition of this protein is essential to stop necroptotic process and eventually cell death. Current structure-based virtual screening methods involve a wide range of strategies and recently, considering the multiple protein structures for pharmacophore extraction has been emphasized as a way to improve the outcome. However, using the pharmacophoric information completely during docking is very important. Further, in such methods, using the appropriate protein structures for docking is desirable. If not, potential compound hits, obtained through pharmacophore-based screening, may not have correct ranks and scores after docking. Therefore, a comprehensive integration of different ensemble methods is essential, which may provide better virtual screening results. In this study, dual ensemble screening, a novel computational strategy was used to identify diverse and potent inhibitors against RIPK1. All the pharmacophore features present in the binding site were captured using both the apo and holo protein structures and an ensemble pharmacophore was built by combining these features. This ensemble pharmacophore was employed in pharmacophore-based screening of ZINC database. The compound hits, thus obtained, were subjected to ensemble docking. The leads acquired through docking were further validated through feature evaluation and molecular dynamics simulation.

  9. The crystal structure of mammalian inositol 1,3,4,5,6-pentakisphosphate 2-kinase reveals a new zinc-binding site and key features for protein function

    PubMed Central

    Franco-Echevarría, Elsa; Sanz-Aparicio, Julia; Brearley, Charles A.; González-Rubio, Juana M.; González, Beatriz

    2017-01-01

    Inositol 1,3,4,5,6-pentakisphosphate 2-kinases (IP5 2-Ks) are part of a family of enzymes in charge of synthesizing inositol hexakisphosphate (IP6) in eukaryotic cells. This protein and its product IP6 present many roles in cells, participating in mRNA export, embryonic development, and apoptosis. We reported previously that the full-length IP5 2-K from Arabidopsis thaliana is a zinc metallo-enzyme, including two separated lobes (the N- and C-lobes). We have also shown conformational changes in IP5 2-K and have identified the residues involved in substrate recognition and catalysis. However, the specific features of mammalian IP5 2-Ks remain unknown. To this end, we report here the first structure for a murine IP5 2-K in complex with ATP/IP5 or IP6. Our structural findings indicated that the general folding in N- and C-lobes is conserved with A. thaliana IP5 2-K. A helical scaffold in the C-lobe constitutes the inositol phosphate-binding site, which, along with the participation of the N-lobe, endows high specificity to this protein. However, we also noted large structural differences between the orthologues from these two eukaryotic kingdoms. These differences include a novel zinc-binding site and regions unique to the mammalian IP5 2-K, as an unexpected basic patch on the protein surface. In conclusion, our findings have uncovered distinct features of a mammalian IP5 2-K and set the stage for investigations into protein-protein or protein-RNA interactions important for IP5 2-K function and activity. PMID:28450399

  10. Atomic structures of corkscrew-forming segments of SOD1 reveal varied oligomer conformations.

    PubMed

    Sangwan, Smriti; Sawaya, Michael R; Murray, Kevin A; Hughes, Michael P; Eisenberg, David S

    2018-02-17

    The aggregation cascade of disease-related amyloidogenic proteins, terminating in insoluble amyloid fibrils, involves intermediate oligomeric states. The structural and biochemical details of these oligomers have been largely unknown. Here we report crystal structures of variants of the cytotoxic oligomer-forming segment residues 28-38 of the ALS-linked protein, SOD1. The crystal structures reveal three different architectures: corkscrew oligomeric structure, nontwisting curved sheet structure and a steric zipper proto-filament structure. Our work highlights the polymorphism of the segment 28-38 of SOD1 and identifies the molecular features of amyloidogenic entities. © 2018 The Protein Society.

  11. Contribution of low-temperature single-molecule techniques to structural issues of pigment–protein complexes from photosynthetic purple bacteria

    PubMed Central

    Löhner, Alexander; Cogdell, Richard

    2018-01-01

    As the electronic energies of the chromophores in a pigment–protein complex are imposed by the geometrical structure of the protein, this allows the spectral information obtained to be compared with predictions derived from structural models. Thereby, the single-molecule approach is particularly suited for the elucidation of specific, distinctive spectral features that are key for a particular model structure, and that would not be observable in ensemble-averaged spectra due to the heterogeneity of the biological objects. In this concise review, we illustrate with the example of the light-harvesting complexes from photosynthetic purple bacteria how results from low-temperature single-molecule spectroscopy can be used to discriminate between different structural models. Thereby the low-temperature approach provides two advantages: (i) owing to the negligible photobleaching, very long observation times become possible, and more importantly, (ii) at cryogenic temperatures, vibrational degrees of freedom are frozen out, leading to sharper spectral features and in turn to better resolved spectra. PMID:29321265

  12. Structural insights of ZIP4 extracellular domain critical for optimal zinc transport

    NASA Astrophysics Data System (ADS)

    Zhang, Tuo; Sui, Dexin; Hu, Jian

    2016-06-01

    The ZIP zinc transporter family is responsible for zinc uptake from the extracellular milieu or intracellular vesicles. The LIV-1 subfamily, containing nine out of the 14 human ZIP proteins, is featured with a large extracellular domain (ECD). The critical role of the ECD is manifested by disease-causing mutations on ZIP4, a representative LIV-1 protein. Here we report the first crystal structure of a mammalian ZIP4-ECD, which reveals two structurally independent subdomains and an unprecedented dimer centred at the signature PAL motif. Structure-guided mutagenesis, cell-based zinc uptake assays and mapping of the disease-causing mutations indicate that the two subdomains play pivotal but distinct roles and that the bridging region connecting them is particularly important for ZIP4 function. These findings lead to working hypotheses on how ZIP4-ECD exerts critical functions in zinc transport. The conserved dimeric architecture in ZIP4-ECD is also demonstrated to be a common structural feature among the LIV-1 proteins.

  13. Molecular interactions within the halophilic, thermophilic, and mesophilic prokaryotic ribosomal complexes: clues to environmental adaptation.

    PubMed

    Mallik, Saurav; Kundu, Sudip

    2015-01-01

    Using the available crystal structures of 50S ribosomal subunits from three prokaryotic species: Escherichia coli (mesophilic), Thermus thermophilus (thermophilic), and Haloarcula marismortui (halophilic), we have analyzed different structural features of ribosomal RNAs (rRNAs), proteins, and of their interfaces. We have correlated these structural features with the environmental adaptation strategies of the corresponding species. While dense intra-rRNA packing is observed in thermophilic, loose intra-rRNA packing is observed in halophilic (both compared to mesophilic). Interestingly, protein-rRNA interfaces of both the extremophiles are densely packed compared to that of the mesophilic. The intersubunit bridge regions are almost devoid of cavities, probably ensuring the proper formation of each bridge (by not allowing any loosely packed region nearby). During rRNA binding, the ribosomal proteins experience some structural transitions. Here, we have analyzed the intrinsically disordered and ordered regions of the ribosomal proteins, which are subjected to such transitions. The intrinsically disordered and disorder-to-order transition sites of the thermophilic and mesophilic ribosomal proteins are simultaneously (i) highly conserved and (ii) slowly evolving compared to rest of the protein structure. Although high conservation is observed at such sites of halophilic ribosomal proteins, but slow rate of evolution is absent. Such differences between thermophilic, mesophilic, and halophilic can be explained from their environmental adaptation strategy. Interestingly, a universal biophysical principle evident by a linear relationship between the free energy of interface formation, interface area, and structural changes of r-proteins during assembly is always maintained, irrespective of the environmental conditions.

  14. Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein-protein binding affinity.

    PubMed

    Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian

    2013-01-01

    Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.

  15. Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

    PubMed

    Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A

    2007-12-15

    Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.

  16. Atomic structures of fibrillar segments of hIAPP suggest tightly mated β-sheets are important for cytotoxicity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krotee, Pascal; Rodriguez, Jose A.; Sawaya, Michael R.

    2017-01-03

    hIAPP fibrils are associated with Type-II Diabetes, but the link of hIAPP structure to islet cell death remains elusive. Here we observe that hIAPP fibrils are cytotoxic to cultured pancreatic β-cells, leading us to determine the structure and cytotoxicity of protein segments composing the amyloid spine of hIAPP. Using the cryoEM method MicroED, we discover that one segment, 19–29 S20G, forms pairs of β-sheets mated by a dry interface that share structural features with and are similarly cytotoxic to full-length hIAPP fibrils. In contrast, a second segment, 15–25 WT, forms non-toxic labile β-sheets. These segments possess different structures and cytotoxicmore » effects, however, both can seed full-length hIAPP, and cause hIAPP to take on the cytotoxic and structural features of that segment. These results suggest that protein segment structures represent polymorphs of their parent protein and that segment 19–29 S20G may serve as a model for the toxic spine of hIAPP.« less

  17. Determining crystal structures through crowdsourcing and coursework

    PubMed Central

    Horowitz, Scott; Koepnick, Brian; Martin, Raoul; Tymieniecki, Agnes; Winburn, Amanda A.; Cooper, Seth; Flatten, Jeff; Rogawski, David S.; Koropatkin, Nicole M.; Hailu, Tsinatkeab T.; Jain, Neha; Koldewey, Philipp; Ahlstrom, Logan S.; Chapman, Matthew R.; Sikkema, Andrew P.; Skiba, Meredith A.; Maloney, Finn P.; Beinlich, Felix R. M.; Caglar, Ahmet; Coral, Alan; Jensen, Alice Elizabeth; Lubow, Allen; Boitano, Amanda; Lisle, Amy Elizabeth; Maxwell, Andrew T.; Failer, Barb; Kaszubowski, Bartosz; Hrytsiv, Bohdan; Vincenzo, Brancaccio; de Melo Cruz, Breno Renan; McManus, Brian Joseph; Kestemont, Bruno; Vardeman, Carl; Comisky, Casey; Neilson, Catherine; Landers, Catherine R.; Ince, Christopher; Buske, Daniel Jon; Totonjian, Daniel; Copeland, David Marshall; Murray, David; Jagieła, Dawid; Janz, Dietmar; Wheeler, Douglas C.; Cali, Elie; Croze, Emmanuel; Rezae, Farah; Martin, Floyd Orville; Beecher, Gil; de Jong, Guido Alexander; Ykman, Guy; Feldmann, Harald; Chan, Hugo Paul Perez; Kovanecz, Istvan; Vasilchenko, Ivan; Connellan, James C.; Borman, Jami Lynne; Norrgard, Jane; Kanfer, Jebbie; Canfield, Jeffrey M.; Slone, Jesse David; Oh, Jimmy; Mitchell, Joanne; Bishop, John; Kroeger, John Douglas; Schinkler, Jonas; McLaughlin, Joseph; Brownlee, June M.; Bell, Justin; Fellbaum, Karl Willem; Harper, Kathleen; Abbey, Kirk J.; Isaksson, Lennart E.; Wei, Linda; Cummins, Lisa N.; Miller, Lori Anne; Bain, Lyn; Carpenter, Lynn; Desnouck, Maarten; Sharma, Manasa G.; Belcastro, Marcus; Szew, Martin; Szew, Martin; Britton, Matthew; Gaebel, Matthias; Power, Max; Cassidy, Michael; Pfützenreuter, Michael; Minett, Michele; Wesselingh, Michiel; Yi, Minjune; Cameron, Neil Haydn Tormey; Bolibruch, Nicholas I.; Benevides, Noah; Kathleen Kerr, Norah; Barlow, Nova; Crevits, Nykole Krystyne; Dunn, Paul; Roque, Paulo Sergio Silveira Belo Nascimento; Riber, Peter; Pikkanen, Petri; Shehzad, Raafay; Viosca, Randy; James Fraser, Robert; Leduc, Robert; Madala, Roman; Shnider, Scott; de Boisblanc, Sharon; Butkovich, Slava; Bliven, Spencer; Hettler, Stephen; Telehany, Stephen; Schwegmann, Steven A.; Parkes, Steven; Kleinfelter, Susan C.; Michael Holst, Sven; van der Laan, T. J. A.; Bausewein, Thomas; Simon, Vera; Pulley, Warwick; Hull, William; Kim, Annes Yukyung; Lawton, Alexis; Ruesch, Amanda; Sundar, Anjali; Lawrence, Anna-Lisa; Afrin, Antara; Maheshwer, Bhargavi; Turfe, Bilal; Huebner, Christian; Killeen, Courtney Elizabeth; Antebi-Lerrman, Dalia; Luan, Danny; Wolfe, Derek; Pham, Duc; Michewicz, Elaina; Hull, Elizabeth; Pardington, Emily; Galal, Galal Osama; Sun, Grace; Chen, Grace; Anderson, Halie E.; Chang, Jane; Hewlett, Jeffrey Thomas; Sterbenz, Jennifer; Lim, Jiho; Morof, Joshua; Lee, Junho; Inn, Juyoung Samuel; Hahm, Kaitlin; Roth, Kaitlin; Nair, Karun; Markin, Katherine; Schramm, Katie; Toni Eid, Kevin; Gam, Kristina; Murphy, Lisha; Yuan, Lucy; Kana, Lulia; Daboul, Lynn; Shammas, Mario Karam; Chason, Max; Sinan, Moaz; Andrew Tooley, Nicholas; Korakavi, Nisha; Comer, Patrick; Magur, Pragya; Savliwala, Quresh; Davison, Reid Michael; Sankaran, Roshun Rajiv; Lewe, Sam; Tamkus, Saule; Chen, Shirley; Harvey, Sho; Hwang, Sin Ye; Vatsia, Sohrab; Withrow, Stefan; Luther, Tahra K; Manett, Taylor; Johnson, Thomas James; Ryan Brash, Timothy; Kuhlman, Wyatt; Park, Yeonjung; Popović, Zoran; Baker, David; Khatib, Firas; Bardwell, James C. A.

    2016-01-01

    We show here that computer game players can build high-quality crystal structures. Introduction of a new feature into the computer game Foldit allows players to build and real-space refine structures into electron density maps. To assess the usefulness of this feature, we held a crystallographic model-building competition between trained crystallographers, undergraduate students, Foldit players and automatic model-building algorithms. After removal of disordered residues, a team of Foldit players achieved the most accurate structure. Analysing the target protein of the competition, YPL067C, uncovered a new family of histidine triad proteins apparently involved in the prevention of amyloid toxicity. From this study, we conclude that crystallographers can utilize crowdsourcing to interpret electron density information and to produce structure solutions of the highest quality. PMID:27633552

  18. Structural insights into the inactivation of CRISPR-Cas systems by diverse anti-CRISPR proteins.

    PubMed

    Zhu, Yuwei; Zhang, Fan; Huang, Zhiwei

    2018-03-19

    A molecular arms race is progressively being unveiled between prokaryotes and viruses. Prokaryotes utilize CRISPR-mediated adaptive immune systems to kill the invading phages and mobile genetic elements, and in turn, the viruses evolve diverse anti-CRISPR proteins to fight back. The structures of several anti-CRISPR proteins have now been reported, and here we discuss their structural features, with a particular emphasis on topology, to discover their similarities and differences. We summarize the CRISPR-Cas inhibition mechanisms of these anti-CRISPR proteins in their structural context. Considering anti-CRISPRs in this way will provide important clues for studying their origin and evolution.

  19. Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

    PubMed

    Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

    2012-12-01

    Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.

  20. Amyloidogenesis of Natively Unfolded Proteins

    PubMed Central

    Uversky, Vladimir N.

    2009-01-01

    Aggregation and subsequent development of protein deposition diseases originate from conformational changes in corresponding amyloidogenic proteins. The accumulated data support the model where protein fibrillogenesis proceeds via the formation of a relatively unfolded amyloidogenic conformation, which shares many structural properties with the pre-molten globule state, a partially folded intermediate first found during the equilibrium and kinetic (un)folding studies of several globular proteins and later described as one of the structural forms of natively unfolded proteins. The flexibility of this structural form is essential for the conformational rearrangements driving the formation of the core cross-beta structure of the amyloid fibril. Obviously, molecular mechanisms describing amyloidogenesis of ordered and natively unfolded proteins are different. For ordered protein to fibrillate, its unique and rigid structure has to be destabilized and partially unfolded. On the other hand, fibrillogenesis of a natively unfolded protein involves the formation of partially folded conformation; i.e., partial folding rather than unfolding. In this review recent findings are surveyed to illustrate some unique features of the natively unfolded proteins amyloidogenesis. PMID:18537543

  1. [Regression analysis to select native-like structures from decoys of antigen-antibody docking].

    PubMed

    Chen, Zhengshan; Chi, Xiangyang; Fan, Pengfei; Zhang, Guanying; Wang, Meirong; Yu, Changming; Chen, Wei

    2018-06-25

    Given the increasing exploitation of antibodies in different contexts such as molecular diagnostics and therapeutics, it would be beneficial to unravel properties of antigen-antibody interaction with modeling of computational protein-protein docking, especially, in the absence of a cocrystal structure. However, obtaining a native-like antigen-antibody structure remains challenging due in part to failing to reliably discriminate accurate from inaccurate structures among tens of thousands of decoys after computational docking with existing scoring function. We hypothesized that some important physicochemical and energetic features could be used to describe antigen-antibody interfaces and identify native-like antigen-antibody structure. We prepared a dataset, a subset of Protein-Protein Docking Benchmark Version 4.0, comprising 37 nonredundant 3D structures of antigen-antibody complexes, and used it to train and test multivariate logistic regression equation which took several important physicochemical and energetic features of decoys as dependent variables. Our results indicate that the ability to identify native-like structures of our method is superior to ZRANK and ZDOCK score for the subset of antigen-antibody complexes. And then, we use our method in workflow of predicting epitope of anti-Ebola glycoprotein monoclonal antibody-4G7 and identify three accurate residues in its epitope.

  2. Time to face the fats: what can mass spectrometry reveal about the structure of lipids and their interactions with proteins?

    PubMed

    Brown, Simon H J; Mitchell, Todd W; Oakley, Aaron J; Pham, Huong T; Blanksby, Stephen J

    2012-09-01

    Since the 1950s, X-ray crystallography has been the mainstay of structural biology, providing detailed atomic-level structures that continue to revolutionize our understanding of protein function. From recent advances in this discipline, a picture has emerged of intimate and specific interactions between lipids and proteins that has driven renewed interest in the structure of lipids themselves and raised intriguing questions as to the specificity and stoichiometry in lipid-protein complexes. Herein we demonstrate some of the limitations of crystallography in resolving critical structural features of ligated lipids and thus determining how these motifs impact protein binding. As a consequence, mass spectrometry must play an important and complementary role in unraveling the complexities of lipid-protein interactions. We evaluate recent advances and highlight ongoing challenges towards the twin goals of (1) complete structure elucidation of low, abundant, and structurally diverse lipids by mass spectrometry alone, and (2) assignment of stoichiometry and specificity of lipid interactions within protein complexes.

  3. Time to Face the Fats: What Can Mass Spectrometry Reveal about the Structure of Lipids and Their Interactions with Proteins?

    NASA Astrophysics Data System (ADS)

    Brown, Simon H. J.; Mitchell, Todd W.; Oakley, Aaron J.; Pham, Huong T.; Blanksby, Stephen J.

    2012-09-01

    Since the 1950s, X-ray crystallography has been the mainstay of structural biology, providing detailed atomic-level structures that continue to revolutionize our understanding of protein function. From recent advances in this discipline, a picture has emerged of intimate and specific interactions between lipids and proteins that has driven renewed interest in the structure of lipids themselves and raised intriguing questions as to the specificity and stoichiometry in lipid-protein complexes. Herein we demonstrate some of the limitations of crystallography in resolving critical structural features of ligated lipids and thus determining how these motifs impact protein binding. As a consequence, mass spectrometry must play an important and complementary role in unraveling the complexities of lipid-protein interactions. We evaluate recent advances and highlight ongoing challenges towards the twin goals of (1) complete structure elucidation of low, abundant, and structurally diverse lipids by mass spectrometry alone, and (2) assignment of stoichiometry and specificity of lipid interactions within protein complexes.

  4. Complete fold annotation of the human proteome using a novel structural feature space

    DOE PAGES

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less

  5. Insights into structural features determining odorant affinities to honey bee odorant binding protein 14.

    PubMed

    Schwaighofer, Andreas; Pechlaner, Maria; Oostenbrink, Chris; Kotlowski, Caroline; Araman, Can; Mastrogiacomo, Rosa; Pelosi, Paolo; Knoll, Wolfgang; Nowak, Christoph; Larisika, Melanie

    2014-04-18

    Molecular interactions between odorants and odorant binding proteins (OBPs) are of major importance for understanding the principles of selectivity of OBPs towards the wide range of semiochemicals. It is largely unknown on a structural basis, how an OBP binds and discriminates between odorant molecules. Here we examine this aspect in greater detail by comparing the C-minus OBP14 of the honey bee (Apis mellifera L.) to a mutant form of the protein that comprises the third disulfide bond lacking in C-minus OBPs. Affinities of structurally analogous odorants featuring an aromatic phenol group with different side chains were assessed based on changes of the thermal stability of the protein upon odorant binding monitored by circular dichroism spectroscopy. Our results indicate a tendency that odorants show higher affinity to the wild-type OBP suggesting that the introduced rigidity in the mutant protein has a negative effect on odorant binding. Furthermore, we show that OBP14 stability is very sensitive to the position and type of functional groups in the odorant. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

    PubMed

    Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

    2009-04-21

    In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

  7. 3DNALandscapes: a database for exploring the conformational features of DNA.

    PubMed

    Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

    2010-01-01

    3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.

  8. CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures.

    PubMed

    Kuriata, Aleksander; Gierut, Aleksandra Maria; Oleniecki, Tymoteusz; Ciemny, Maciej Pawel; Kolinski, Andrzej; Kurcinski, Mateusz; Kmiecik, Sebastian

    2018-05-14

    Classical simulations of protein flexibility remain computationally expensive, especially for large proteins. A few years ago, we developed a fast method for predicting protein structure fluctuations that uses a single protein model as the input. The method has been made available as the CABS-flex web server and applied in numerous studies of protein structure-function relationships. Here, we present a major update of the CABS-flex web server to version 2.0. The new features include: extension of the method to significantly larger and multimeric proteins, customizable distance restraints and simulation parameters, contact maps and a new, enhanced web server interface. CABS-flex 2.0 is freely available at http://biocomp.chem.uw.edu.pl/CABSflex2.

  9. Brownian dynamics simulation of protein diffusion in crowded environments

    NASA Astrophysics Data System (ADS)

    Mereghetti, Paolo; Wade, Rebecca C.

    2013-02-01

    High macromolecular concentrations are a distinguishing feature of living organisms. Understanding how the high concentration of solutes affects the dynamic properties of biological macromolecules is fundamental for the comprehension of biological processes in living systems. We first describe the development of a Brownian dynamics simulation methodology to investigate the dynamic and structural properties of protein solutions using atomic-detail protein structures. We then discuss insights obtained from applying this approach to simulation of solutions of a range of types of proteins.

  10. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. The hypothetical protein Atu4866 from Agrobacterium tumefaciens adopts a streptavidin-like fold

    PubMed Central

    Ai, Xuanjun; Semesi, Anthony; Yee, Adelinda; Arrowsmith, Cheryl H.; Choy, Wing-Yiu; Li, Shawn S.C.

    2008-01-01

    Atu4866 is a 79-residue conserved hypothetical protein of unknown function from Agrobacterium tumefaciens. Protein sequence alignments show that it shares ≥60% sequence identity with 20 other hypothetical proteins of bacterial origin. However, the structures and functions of these proteins remain unknown so far. To gain insight into the function of this family of proteins, we have determined the structure of Atu4866 as a target of a structural genomics project using solution NMR spectroscopy. Our results reveal that Atu4866 adopts a streptavidin-like fold featuring a β-barrel/sandwich formed by eight antiparallel β-strands. Further structural analysis identified a continuous patch of conserved residues on the surface of Atu4866 that may constitute a potential ligand-binding site. PMID:18042676

  12. Protons, osmolytes, and fitness of internal milieu for protein function.

    PubMed

    Somero, G N

    1986-08-01

    The composition of the intracellular milieu shows striking similarities among widely different species. Only certain values of intracellular pH, values that generally reflect alphastat regulation, and only narrow ranges of inorganic ion concentrations are found in the cytoplasm of the cells of most animals, plants, and microorganisms. In water-stressed organisms only a few types of low-molecular-weight organic molecules (osmolytes) are accumulated. These highly conserved characteristics of the intracellular fluids reflect the need to maintain critical features of macromolecules within narrow ranges optimal for life. For proteins these features include maintaining adequate rates of catalysis, a high level of regulatory responsiveness, and a precise balance between stability and lability of structure (tertiary conformation, subunit assembly, and multiprotein complexes). The optimal values for these functional and structural features of proteins often lie near the midrange of possible values for these properties, and only under specific conditions of intracellular pH, ionic strength, and osmolyte composition are these optimal midrange values conserved. In dormant cells the departure of solution conditions from values that are optimal for protein function and structure may be instrumental in reducing or shutting down metabolic functions. Seen from a broad evolutionary perspective, the evolution of the intracellular milieu is an important complement to macromolecular evolution. In certain instances appropriate modifications of the internal milieu may reduce the need for adaptive amino acid replacements in proteins.

  13. Analysis of sequencing data for probing RNA secondary structures and protein-RNA binding in studying posttranscriptional regulations.

    PubMed

    Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y

    2016-11-01

    High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  14. CASTp 3.0: computed atlas of surface topography of proteins.

    PubMed

    Tian, Wei; Chen, Chang; Lei, Xue; Zhao, Jieling; Liang, Jie

    2018-06-01

    Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.

  15. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    PubMed

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  16. High Precision Prediction of Functional Sites in Protein Structures

    PubMed Central

    Buturovic, Ljubomir; Wong, Mike; Tang, Grace W.; Altman, Russ B.; Petkovic, Dragutin

    2014-01-01

    We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta. PMID:24632601

  17. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  18. CALCOM: a software for calculating the center of mass of proteins.

    PubMed

    Costantini, Susan; Paladino, Antonella; Facchiano, Angelo M

    2008-02-09

    The center of mass of a protein is an artificial point useful for detecting important and simple features of proteins structure, shape and association.CALCOM is a software which calculates the center of mass of a protein, starting from PDB protein structure files. In the case of protein complexes and of protein-small ligand complexes, the position of protein residues or of ligand atoms respect to each protein subunit can be evaluated, as well as the distance among the center of mass of the protein subunits, in order to compare different conformations and evaluate the relative motion of subunits. THE SERVICE IS AVAILABLE AT THE URL: http://bioinformatica.isa.cnr.it/CALCOM/.

  19. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.

    PubMed

    Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G

    2007-11-01

    Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.

  20. Improve the prediction of RNA-binding residues using structural neighbours.

    PubMed

    Li, Quan; Cao, Zanxia; Liu, Haiyan

    2010-03-01

    The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.

  1. Structural and functional studies of a 50 kDa antigenic protein from Salmonella enterica serovar Typhi.

    PubMed

    Choong, Yee Siew; Lim, Theam Soon; Chew, Ai Lan; Aziah, Ismail; Ismail, Asma

    2011-04-01

    The high typhoid incidence rate in developing and under-developed countries emphasizes the need for a rapid, affordable and accessible diagnostic test for effective therapy and disease management. TYPHIDOT®, a rapid dot enzyme immunoassay test for typhoid, was developed from the discovery of a ∼50 kDa protein specific for Salmonella enterica serovar Typhi. However, the structure of this antigen remains unknown till today. Studies on the structure of this antigen are important to elucidate its function, which will in turn increase the efficiency of the development and improvement of the typhoid detection test. This paper described the predictive structure and function of the antigenically specific protein. The homology modeling approach was employed to construct the three-dimensional structure of the antigen. The built structure possesses the features of TolC-like outer membrane protein. Molecular docking simulation was also performed to further probe the functionality of the antigen. Docking results showed that hexamminecobalt, Co(NH(3))(6)(3+), as an inhibitor of TolC protein, formed favorable hydrogen bonds with D368 and D371 of the antigen. The single point (D368A, D371A) and double point (D368A and D371A) mutations of the antigen showed a decrease (single point mutation) and loss (double point mutations) of binding affinity towards hexamminecobalt. The architecture features of the built model and the docking simulation reinforced and supported that this antigen is indeed the variant of outer membrane protein, TolC. As channel proteins are important for the virulence and survival of bacteria, therefore this ∼50 kDa channel protein is a good specific target for typhoid detection test. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Understanding the Structural Ensembles of a Highly Extended Disordered Protein†

    PubMed Central

    Daughdrill, Gary W.; Kashtanov, Stepan; Stancik, Amber; Hill, Shannon E.; Helms, Gregory; Muschol, Martin

    2013-01-01

    Developing a comprehensive description of the equilibrium structural ensembles for intrinsically disordered proteins (IDPs) is essential to understanding their function. The p53 transactivation domain (p53TAD) is an IDP that interacts with multiple protein partners and contains numerous phosphorylation sites. Multiple techniques were used to investigate the equilibrium structural ensemble of p53TAD in its native and chemically unfolded states. The results from these experiments show that the native state of p53TAD has dimensions similar to a classical random coil while the chemically unfolded state is more extended. To investigate the molecular properties responsible for this behavior, a novel algorithm that generates diverse and unbiased structural ensembles of IDPs was developed. This algorithm was used to generate a large pool of plausible p53TAD structures that were reweighted to identify a subset of structures with the best fit to small angle X-ray scattering data. High weight structures in the native state ensemble show features that are localized to protein binding sites and regions with high proline content. The features localized to the protein binding sites are mostly eliminated in the chemically unfolded ensemble; while, the regions with high proline content remain relatively unaffected. Data from NMR experiments support these results, showing that residues from the protein binding sites experience larger environmental changes upon unfolding by urea than regions with high proline content. This behavior is consistent with the urea-induced exposure of nonpolar and aromatic side-chains in the protein binding sites that are partially excluded from solvent in the native state ensemble. PMID:21979461

  3. Protein-directed assembly of arbitrary three-dimensional nanoporous silica architectures.

    PubMed

    Khripin, Constantine Y; Pristinski, Denis; Dunphy, Darren R; Brinker, C Jeffrey; Kaehr, Bryan

    2011-02-22

    Through precise control of nanoscale building blocks, such as proteins and polyamines, silica condensing microorganisms are able to create intricate mineral structures displaying hierarchical features from nano- to millimeter-length scales. The creation of artificial structures of similar characteristics is facilitated through biomimetic approaches, for instance, by first creating a bioscaffold comprised of silica condensing moieties which, in turn, govern silica deposition into three-dimensional (3D) structures. In this work, we demonstrate a protein-directed approach to template silica into true arbitrary 3D architectures by employing cross-linked protein hydrogels to controllably direct silica condensation. Protein hydrogels are fabricated using multiphoton lithography, which enables user-defined control over template features in three dimensions. Silica deposition, under acidic conditions, proceeds throughout protein hydrogel templates via flocculation of silica nanoparticles by protein molecules, as indicated by dynamic light scattering (DLS) and time-dependent measurements of elastic modulus. Following silica deposition, the protein template can be removed using mild thermal processing yielding high surface area (625 m(2)/g) porous silica replicas that do not undergo significant volume change compared to the starting template. We demonstrate the capabilities of this approach to create bioinspired silica microstructures displaying hierarchical features over broad length scales and the infiltration/functionalization capabilities of the nanoporous silica matrix by laser printing a 3D gold image within a 3D silica matrix. This work provides a foundation to potentially understand and mimic biogenic silica condensation under the constraints of user-defined biotemplates and further should enable a wide range of complex inorganic architectures to be explored using silica transformational chemistries, for instance silica to silicon, as demonstrated herein.

  4. An in-silico method for identifying aggregation rate enhancer and mitigator mutations in proteins.

    PubMed

    Rawat, Puneet; Kumar, Sandeep; Michael Gromiha, M

    2018-06-24

    Newly synthesized polypeptides must pass stringent quality controls in cells to ensure appropriate folding and function. However, mutations, environmental stresses and aging can reduce efficiencies of these controls, leading to accumulation of protein aggregates, amyloid fibrils and plaques. In-vitro experiments have shown that even single amino acid substitutions can drastically enhance or mitigate protein aggregation kinetics. In this work, we have collected a dataset of 220 unique mutations in 25 proteins and classified them as enhancers or mitigators on the basis of their effect on protein aggregation rate. The data were analyzed via machine learning to identify features capable of distinguishing between aggregation rate enhancers and mitigators. Our initial Support Vector Machine (SVM) model separated such mutations with an overall accuracy of 69%. When local secondary structures at the mutation sites were considered, the accuracies further improved by 13-15%. The machine-learnt features are distinct for each secondary structure class at mutation sites. Protein stability and flexibility changes are important features for mutations in α-helices. β-strand propensity, polarity and charge become important when mutations occur in β-strands and ability to form secondary structure, helical tendency and aggregation propensity are important for mutations lying in coils. These results have been incorporated into a sequence-based algorithm (available at http://www.iitm.ac.in/bioinfo/aggrerate-disc/) capable of predicting whether a mutation will enhance or mitigate a protein's aggregation rate. This algorithm will find several applications towards understanding protein aggregation in human diseases, enable in-silico optimization of biopharmaceuticals and enzymes for improved biophysical attributes and de novo design of bio-nanomaterials. Copyright © 2018. Published by Elsevier B.V.

  5. Characterization of Three Different Unusual S-Layer Proteins from Viridibacillus arvi JG-B58 That Exhibits Two Super-Imposed S-Layer Proteins

    PubMed Central

    Günther, Tobias J.; Raff, Johannes; Pollmann, Katrin

    2016-01-01

    Genomic analyses of Viridibacillus arvi JG-B58 that was previously isolated from heavy metal contaminated environment identified three different putative surface layer (S-layer) protein genes namely slp1, slp2, and slp3. All three genes are expressed during cultivation. At least two of the V. arvi JG-B58 S-layer proteins were visualized on the surface of living cells via atomic force microscopy (AFM). These S-layer proteins form a double layer with p4 symmetry. The S-layer proteins were isolated from the cells using two different methods. Purified S-layer proteins were recrystallized on SiO2 substrates in order to study the structure of the arrays and self-assembling properties. The primary structure of all examined S-layer proteins lack some features that are typical for Bacillus or Lysinibacillus S-layers. For example, they possess no SLH domains that are usually responsible for the anchoring of the proteins to the cell wall. Further, the pI values are relatively high ranging from 7.84 to 9.25 for the matured proteins. Such features are typical for S-layer proteins of Lactobacillus species although sequence comparisons indicate a close relationship to S-layer proteins of Lysinibacillus and Bacillus strains. In comparison to the numerous descriptions of S-layers, there are only a few studies reporting the concomitant existence of two different S-layer proteins on cell surfaces. Together with the genomic data, this is the first description of a novel type of S-layer proteins showing features of Lactobacillus as well as of Bacillus-type S-layer proteins and the first study of the cell envelope of Viridibacillus arvi. PMID:27285458

  6. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    PubMed

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  7. Structure-Based Design of Highly Selective Inhibitors of the CREB Binding Protein Bromodomain.

    PubMed

    Denny, R Aldrin; Flick, Andrew C; Coe, Jotham; Langille, Jonathan; Basak, Arindrajit; Liu, Shenping; Stock, Ingrid; Sahasrabudhe, Parag; Bonin, Paul; Hay, Duncan A; Brennan, Paul E; Pletcher, Mathew; Jones, Lyn H; Chekler, Eugene L Piatnitski

    2017-07-13

    Chemical probes are required for preclinical target validation to interrogate novel biological targets and pathways. Selective inhibitors of the CREB binding protein (CREBBP)/EP300 bromodomains are required to facilitate the elucidation of biology associated with these important epigenetic targets. Medicinal chemistry optimization that paid particular attention to physiochemical properties delivered chemical probes with desirable potency, selectivity, and permeability attributes. An important feature of the optimization process was the successful application of rational structure-based drug design to address bromodomain selectivity issues (particularly against the structurally related BRD4 protein).

  8. Unusual biophysics of intrinsically disordered proteins.

    PubMed

    Uversky, Vladimir N

    2013-05-01

    Research of a past decade and a half leaves no doubt that complete understanding of protein functionality requires close consideration of the fact that many functional proteins do not have well-folded structures. These intrinsically disordered proteins (IDPs) and proteins with intrinsically disordered protein regions (IDPRs) are highly abundant in nature and play a number of crucial roles in a living cell. Their functions, which are typically associated with a wide range of intermolecular interactions where IDPs possess remarkable binding promiscuity, complement functional repertoire of ordered proteins. All this requires a close attention to the peculiarities of biophysics of these proteins. In this review, some key biophysical features of IDPs are covered. In addition to the peculiar sequence characteristics of IDPs these biophysical features include sequential, structural, and spatiotemporal heterogeneity of IDPs; their rough and relatively flat energy landscapes; their ability to undergo both induced folding and induced unfolding; the ability to interact specifically with structurally unrelated partners; the ability to gain different structures at binding to different partners; and the ability to keep essential amount of disorder even in the bound form. IDPs are also characterized by the "turned-out" response to the changes in their environment, where they gain some structure under conditions resulting in denaturation or even unfolding of ordered proteins. It is proposed that the heterogeneous spatiotemporal structure of IDPs/IDPRs can be described as a set of foldons, inducible foldons, semi-foldons, non-foldons, and unfoldons. They may lose their function when folded, and activation of some IDPs is associated with the awaking of the dormant disorder. It is possible that IDPs represent the "edge of chaos" systems which operate in a region between order and complete randomness or chaos, where the complexity is maximal. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Fundamental Characteristics of AAA+ Protein Family Structure and Function.

    PubMed

    Miller, Justin M; Enemark, Eric J

    2016-01-01

    Many complex cellular events depend on multiprotein complexes known as molecular machines to efficiently couple the energy derived from adenosine triphosphate hydrolysis to the generation of mechanical force. Members of the AAA+ ATPase superfamily (ATPases Associated with various cellular Activities) are critical components of many molecular machines. AAA+ proteins are defined by conserved modules that precisely position the active site elements of two adjacent subunits to catalyze ATP hydrolysis. In many cases, AAA+ proteins form a ring structure that translocates a polymeric substrate through the central channel using specialized loops that project into the central channel. We discuss the major features of AAA+ protein structure and function with an emphasis on pivotal aspects elucidated with archaeal proteins.

  10. Fast iodide-SAD phasing for high-throughput membrane protein structure determination.

    PubMed

    Melnikov, Igor; Polovinkin, Vitaly; Kovalev, Kirill; Gushchin, Ivan; Shevtsov, Mikhail; Shevchenko, Vitaly; Mishin, Alexey; Alekseev, Alexey; Rodriguez-Valera, Francisco; Borshchevskiy, Valentin; Cherezov, Vadim; Leonard, Gordon A; Gordeliy, Valentin; Popov, Alexander

    2017-05-01

    We describe a fast, easy, and potentially universal method for the de novo solution of the crystal structures of membrane proteins via iodide-single-wavelength anomalous diffraction (I-SAD). The potential universality of the method is based on a common feature of membrane proteins-the availability at the hydrophobic-hydrophilic interface of positively charged amino acid residues with which iodide strongly interacts. We demonstrate the solution using I-SAD of four crystal structures representing different classes of membrane proteins, including a human G protein-coupled receptor (GPCR), and we show that I-SAD can be applied using data collection strategies based on either standard or serial x-ray crystallography techniques.

  11. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles

    PubMed Central

    Brender, Jeffrey R.; Zhang, Yang

    2015-01-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533

  12. DSSR-enhanced visualization of nucleic acid structures in Jmol

    PubMed Central

    Hanson, Robert M.

    2017-01-01

    Abstract Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). PMID:28472503

  13. In silico methods for co-transcriptional RNA secondary structure prediction and for investigating alternative RNA structure expression.

    PubMed

    Meyer, Irmtraud M

    2017-05-01

    RNA transcripts are the primary products of active genes in any living organism, including many viruses. Their cellular destiny not only depends on primary sequence signals, but can also be determined by RNA structure. Recent experimental evidence shows that many transcripts can be assigned more than a single functional RNA structure throughout their cellular life and that structure formation happens co-transcriptionally, i.e. as the transcript is synthesised in the cell. Moreover, functional RNA structures are not limited to non-coding transcripts, but can also feature in coding transcripts. The picture that now emerges is that RNA structures constitute an additional layer of information that can be encoded in any RNA transcript (and on top of other layers of information such as protein-context) in order to exert a wide range of functional roles. Moreover, different encoded RNA structures can be expressed at different stages of a transcript's life in order to alter the transcript's behaviour depending on its actual cellular context. Similar to the concept of alternative splicing for protein-coding genes, where a single transcript can yield different proteins depending on cellular context, it is thus appropriate to propose the notion of alternative RNA structure expression for any given transcript. This review introduces several computational strategies that my group developed to detect different aspects of RNA structure expression in vivo. Two aspects are of particular interest to us: (1) RNA secondary structure features that emerge during co-transcriptional folding and (2) functional RNA structure features that are expressed at different times of a transcript's life and potentially mutually exclusive. Copyright © 2017. Published by Elsevier Inc.

  14. SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

    PubMed

    Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E

    2017-02-03

    SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  15. VMD-SS: A graphical user interface plug-in to calculate the protein secondary structure in VMD program.

    PubMed

    Yahyavi, Masoumeh; Falsafi-Zadeh, Sajad; Karimi, Zahra; Kalatarian, Giti; Galehdari, Hamid

    2014-01-01

    The investigation on the types of secondary structure (SS) of a protein is important. The evolution of secondary structures during molecular dynamics simulations is a useful parameter to analyze protein structures. Therefore, it is of interest to describe VMD-SS (a software program) for the identification of secondary structure elements and its trajectories during simulation for known structures available at the Protein Data Bank (PDB). The program helps to calculate (1) percentage SS, (2) SS occurrence in each residue, (3) percentage SS during simulation, and (4) percentage residues in all SS types during simulation. The VMD-SS plug-in was designed using TCL script and stride to calculate secondary structure features. The database is available for free at http://science.scu.ac.ir/HomePage.aspx?TabID=13755.

  16. Salvage of failed protein targets by reductive alkylation.

    PubMed

    Tan, Kemin; Kim, Youngchang; Hatzos-Skintges, Catherine; Chang, Changsoo; Cuff, Marianne; Chhor, Gekleng; Osipiuk, Jerzy; Michalska, Karolina; Nocek, Boguslaw; An, Hao; Babnigg, Gyorgy; Bigelow, Lance; Joachimiak, Grazyna; Li, Hui; Mack, Jamey; Makowska-Grzyska, Magdalena; Maltseva, Natalia; Mulligan, Rory; Tesar, Christine; Zhou, Min; Joachimiak, Andrzej

    2014-01-01

    The growth of diffraction-quality single crystals is of primary importance in protein X-ray crystallography. Chemical modification of proteins can alter their surface properties and crystallization behavior. The Midwest Center for Structural Genomics (MCSG) has previously reported how reductive methylation of lysine residues in proteins can improve crystallization of unique proteins that initially failed to produce diffraction-quality crystals. Recently, this approach has been expanded to include ethylation and isopropylation in the MCSG protein crystallization pipeline. Applying standard methods, 180 unique proteins were alkylated and screened using standard crystallization procedures. Crystal structures of 12 new proteins were determined, including the first ethylated and the first isopropylated protein structures. In a few cases, the structures of native and methylated or ethylated states were obtained and the impact of reductive alkylation of lysine residues was assessed. Reductive methylation tends to be more efficient and produces the most alkylated protein structures. Structures of methylated proteins typically have higher resolution limits. A number of well-ordered alkylated lysine residues have been identified, which make both intermolecular and intramolecular contacts. The previous report is updated and complemented with the following new data; a description of a detailed alkylation protocol with results, structural features, and roles of alkylated lysine residues in protein crystals. These contribute to improved crystallization properties of some proteins.

  17. Salvage of Failed Protein Targets by Reductive Alkylation

    PubMed Central

    Tan, Kemin; Kim, Youngchang; Hatzos-Skintges, Catherine; Chang, Changsoo; Cuff, Marianne; Chhor, Gekleng; Osipiuk, Jerzy; Michalska, Karolina; Nocek, Boguslaw; An, Hao; Babnigg, Gyorgy; Bigelow, Lance; Joachimiak, Grazyna; Li, Hui; Mack, Jamey; Makowska-Grzyska, Magdalena; Maltseva, Natalia; Mulligan, Rory; Tesar, Christine; Zhou, Min; Joachimiak, Andrzej

    2014-01-01

    The growth of diffraction-quality single crystals is of primary importance in protein X-ray crystallography. Chemical modification of proteins can alter their surface properties and crystallization behavior. The Midwest Center for Structural Genomics (MCSG) has previously reported how reductive methylation of lysine residues in proteins can improve crystallization of unique proteins that initially failed to produce diffraction-quality crystals. Recently, this approach has been expanded to include ethylation and isopropylation in the MCSG protein crystallization pipeline. Applying standard methods, 180 unique proteins were alkylated and screened using standard crystallization procedures. Crystal structures of 12 new proteins were determined, including the first ethylated and the first isopropylated protein structures. In a few cases, the structures of native and methylated or ethylated states were obtained and the impact of reductive alkylation of lysine residues was assessed. Reductive methylation tends to be more efficient and produces the most alkylated protein structures. Structures of methylated proteins typically have higher resolution limits. A number of well-ordered alkylated lysine residues have been identified, which make both intermolecular and intramolecular contacts. The previous report is updated and complemented with the following new data; a description of a detailed alkylation protocol with results, structural features, and roles of alkylated lysine residues in protein crystals. These contribute to improved crystallization properties of some proteins. PMID:24590719

  18. Patchwork structure-function analysis of the Sendai virus matrix protein.

    PubMed

    Mottet-Osman, Geneviève; Miazza, Vincent; Vidalain, Pierre-Olivier; Roux, Laurent

    2014-09-01

    Paramyxoviruses contain a bi-lipidic envelope decorated by two transmembrane glycoproteins and carpeted on the inner surface with a layer of matrix proteins (M), thought to bridge the glycoproteins with the viral nucleocapsids. To characterize M structure-function features, a set of M domains were mutated or deleted. The genes encoding these modified M were incorporated into recombinant Sendai viruses and expressed as supplemental proteins. Using a method of integrated suppression complementation system (ISCS), the functions of these M mutants were analyzed in the context of the infection. Cellular membrane association, localization at the cell periphery, nucleocapsid binding, cellular protein interactions and promotion of viral particle formation were characterized in relation with the mutations. At the end, lack of nucleocapsid binding go together with lack of cell surface localization and both features definitely correlate with loss of M global function estimated by viral particle production. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Platyhelminth Venom Allergen-Like (VAL) proteins: revealing structural diversity, class-specific features and biological associations across the phylum

    PubMed Central

    CHALMERS, IAIN W.; HOFFMANN, KARL F.

    2012-01-01

    SUMMARY During platyhelminth infection, a cocktail of proteins is released by the parasite to aid invasion, initiate feeding, facilitate adaptation and mediate modulation of the host immune response. Included amongst these proteins is the Venom Allergen-Like (VAL) family, part of the larger sperm coating protein/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS) superfamily. To explore the significance of this protein family during Platyhelminthes development and host interactions, we systematically summarize all published proteomic, genomic and immunological investigations of the VAL protein family to date. By conducting new genomic and transcriptomic interrogations to identify over 200 VAL proteins (228) from species in all 4 traditional taxonomic classes (Trematoda, Cestoda, Monogenea and Turbellaria), we further expand our knowledge related to platyhelminth VAL diversity across the phylum. Subsequent phylogenetic and tertiary structural analyses reveal several class-specific VAL features, which likely indicate a range of roles mediated by this protein family. Our comprehensive analysis of platyhelminth VALs represents a unifying synopsis for understanding diversity within this protein family and a firm context in which to initiate future functional characterization of these enigmatic members. PMID:22717097

  20. Fourier transform infrared microspectroscopic analysis of the effects of cereal type and variety within a type of grain on structural makeup in relation to rumen degradation kinetics.

    PubMed

    Walker, Amanda M; Yu, Peiqiang; Christensen, Colleen R; Christensen, David A; McKinnon, John J

    2009-08-12

    The objectives of this study were to use Fourier transform infrared microspectroscopy (FTIRM) to determine structural makeup (features) of cereal grain endosperm tissue and to reveal and identify differences in protein and carbohydrate structural makeup between different cereal types (corn vs barley) and between different varieties within a grain (barley CDC Bold, CDC Dolly, Harrington, and Valier). Another objective was to investigate how these structural features relate to rumen degradation kinetics. The items assessed included (1) structural differences in protein amide I to nonstructural carbohydrate (NSC, starch) intensity and ratio within cellular dimensions; (2) molecular structural differences in the secondary structure profile of protein, alpha-helix, beta-sheet, and their ratio; (3) structural differences in NSC to amide I ratio profile. From the results, it was observed that (1) comparison between grain types [corn (cv. Pioneer 39P78) vs barley (cv. Harrington)] showed significant differences in structural makeup in terms of NSC, amide I to NSC ratio, and rumen degradation kinetics (degradation ratio, effective degradability of dry matter, protein and NSC) (P < 0.05); (2) comparison between varieties within a grain (barley varieties) also showed significant differences in structural makeup in terms of amide I, NSC, amide I to NSC ratio, alpha-helix and beta-sheet protein structures, and rumen degradation kinetics (effective degradability of dry matter, protein, and NSC) (P < 0.05); (3) correlation analysis showed that the amide I to NSC ratio was strongly correlated with rumen degradation kinetics in terms of the degradation rate (R = 0.91, P = 0.086) and effective degradability of dry matter (R = 0.93, P = 0.071). The results suggest that with the FTIRM technique, the structural makeup differences between cereal types and between different varieties within a type of grain could be revealed. These structural makeup differences were related to the rate and extent of rumen degradation.

  1. Molecular properties of food allergens.

    PubMed

    Breiteneder, Heimo; Mills, E N Clare

    2005-01-01

    Plant food allergens belong to a rather limited number of protein families and are also characterized by a number of biochemical and physicochemical properties, many of which are also shared by food allergens of animal origin. These include thermal stability and resistance to proteolysis, which are enhanced by an ability to bind ligands, such as metal ions, lipids, or steroids. Other types of lipid interaction, including membranes or other lipid structures, represent another feature that might promote the allergenic properties of certain food proteins. A structural feature clearly related to stability is intramolecular disulfide bonds alongside posttranslational modifications, such as N-glycosylation. Some plant food allergens, such as the cereal seed storage prolamins, are rheomorphic proteins with polypeptide chains that adopt an ensemble of secondary structures resembling unfolded or partially folded proteins. Other plant food allergens are characterized by the presence of repetitive structures, the ability to form oligomers, and the tendency to aggregate. A summary of our current knowledge regarding the molecular properties of food allergens is presented. Although we cannot as yet predict the allergenicity of a given food protein, understanding of the molecular properties that might predispose them to becoming allergens is an important first step and will undoubtedly contribute to the integrative allergenic risk assessment process being adopted by regulators.

  2. Modularity in protein structures: study on all-alpha proteins.

    PubMed

    Khan, Taushif; Ghosh, Indira

    2015-01-01

    Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.

  3. Structure-activity relationships of phenothiazines and related drugs for inhibition of protein kinase C.

    PubMed

    Aftab, D T; Ballas, L M; Loomis, C R; Hait, W N

    1991-11-01

    Phenothiazines are known to inhibit the activity of protein kinase C. To identify structural features that determine inhibitory activity against the enzyme, we utilized a semiautomated assay [Anal. Biochem. 187:84-88 (1990)] to compare the potency of greater than 50 phenothiazines and related compounds. Potency was decreased by trifluoro substitution at position 2 on the phenothiazine nucleus and increased by quinoid structures on the nucleus. An alkyl bridge of at least three carbons connecting the terminal amine to the nucleus was required for activity. Primary amines and unsubstituted piperazines were the most potent amino side chains. We selected 7,8-dihydroxychlorpromazine (DHCP) (IC50 = 8.3 microM) and 2-chloro-9-(3-[1-piperazinyl]propylidene)thioxanthene (N751) (IC50 = 14 microM) for further study because of their potency and distinct structural features. Under standard (vesicle) assay conditions, DHCP was noncompetitive with respect to phosphatidylserine and a mixed-type inhibitor with respect to ATP. N751 was competitive with respect to phosphatidylserine and noncompetitive with respect to ATP. Using the mixed micelle assay, DHCP was a competitive inhibitor with respect to both phosphatidylserine and ATP. DHCP was selective for protein kinase C compared with cAMP-dependent protein kinase, calmodulin-dependent protein kinase type II, and casein kinase. N751 was more potent against protein kinase C compared with cAMP-dependent protein kinase and casein kinase but less potent against protein kinase C compared with calmodulin-dependent protein kinase type II. DHCP was analyzed for its ability to inhibit different isoenzymes of protein kinase C, and no significant isozyme selectivity was detected. These data provide important information for the rational design of more potent and selective inhibitors of protein kinase C.

  4. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    PubMed

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  5. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  6. HMPAS: Human Membrane Protein Analysis System

    PubMed Central

    2013-01-01

    Background Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups. Methods We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences. Results We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins. Conclusions HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas. PMID:24564858

  7. Short-Time Glassy Dynamics in Viscous Protein Solutions with Competing Interactions

    DOE PAGES

    Godfrin, P. Douglas; Hudson, Steven; Hong, Kunlun; ...

    2015-11-24

    Although there have been numerous investigations of the glass transition for colloidal dispersions with only a short-ranged attraction, less is understood for systems interacting with a long-ranged repulsion in addition to this attraction, which is ubiquitous in aqueous protein solutions at low ionic strength. Highly puri ed concentrated lysozyme solutions are used as a model system and investigated over a large range of protein concentrations at very low ionic strength. Newtonian liquid behavior is observed at all concentrations, even up to 480 mg/mL, where the zero shear viscosity increases by more than three orders of magnitude with increasing concentration. Remarkably,more » despite this macroscopic liquid-like behavior, the measurements of the dynamics in the short-time limit shows features typical of glassy colloidal systems. Investigation of the inter-protein structure indicates that the reduced short-time mobility of the protein is caused by localized regions of high density within a heterogeneous density distribution. This structural heterogeneity occurs on intermediate range length scale, driven by the competing potential features, and is distinct from commonly studied colloidal gel systems in which a heterogeneous density distribution tends to extend to the whole system. The presence of long-ranged repulsion also allows for more mobility over large length and long time scales resulting in the macroscopic relaxation of the structure. The experimental results provide evidence for the need to explicitly include intermediate range order in theories for the macroscopic properties of protein solutions interacting via competing potential features.« less

  8. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  9. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  10. Structural disorder in plant proteins: where plasticity meets sessility.

    PubMed

    Covarrubias, Alejandra A; Cuevas-Velazquez, Cesar L; Romero-Pérez, Paulette S; Rendón-Luna, David F; Chater, Caspar C C

    2017-09-01

    Plants are sessile organisms. This intriguing nature provokes the question of how they survive despite the continual perturbations caused by their constantly changing environment. The large amount of knowledge accumulated to date demonstrates the fascinating dynamic and plastic mechanisms, which underpin the diverse strategies selected in plants in response to the fluctuating environment. This phenotypic plasticity requires an efficient integration of external cues to their growth and developmental programs that can only be achieved through the dynamic and interactive coordination of various signaling networks. Given the versatility of intrinsic structural disorder within proteins, this feature appears as one of the leading characters of such complex functional circuits, critical for plant adaptation and survival in their wild habitats. In this review, we present information of those intrinsically disordered proteins (IDPs) from plants for which their high level of predicted structural disorder has been correlated with a particular function, or where there is experimental evidence linking this structural feature with its protein function. Using examples of plant IDPs involved in the control of cell cycle, metabolism, hormonal signaling and regulation of gene expression, development and responses to stress, we demonstrate the critical importance of IDPs throughout the life of the plant.

  11. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    PubMed Central

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  12. From sequence and forces to structure, function, and evolution of intrinsically disordered proteins.

    PubMed

    Forman-Kay, Julie D; Mittag, Tanja

    2013-09-03

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics, and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales, and compactness are shaping a unified understanding of structure-dynamics-disorder/function relationships. In the 20(th) anniversary of Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional, and evolutionary properties. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2013-05-01

    Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .

  14. Structural and functional features of lysine acetylation of plant and animal tubulins.

    PubMed

    Rayevsky, Alexey V; Sharifi, Mohsen; Samofalova, Dariya A; Karpov, Pavel A; Blume, Yaroslav B

    2017-10-10

    The study of the genome and the proteome of different species and representatives of distinct kingdoms, especially detection of proteome via wide-scaled analyses has various challenges and pitfalls. Attempts to combine all available information together and isolate some common features for determination of the pathway and their mechanism of action generally have a highly complicated nature. However, microtubule (MT) monomers are highly conserved protein structures, and microtubules are structurally conserved from Homo sapiens to Arabidopsis thaliana. The interaction of MT elements with microtubule-associated proteins and post-translational modifiers is fully dependent on protein interfaces, and almost all MT modifications are well described except acetylation. Crystallography and interactome data using different approaches were combined to identify conserved proteins important in acetylation of microtubules. Application of computational methods and comparative analysis of binding modes generated a robust predictive model of acetylation of the ϵ-amino group of Lys40 in α-tubulins. In turn, the model discarded some probable mechanisms of interaction between elements of interest. Reconstruction of unresolved protein structures was carried out with modeling by homology to the existing crystal structure (PDBID: 1Z2B) from B. taurus using Swiss-model server, followed by a molecular dynamics simulation. Docking of the human tubulin fragment with Lys40 into the active site of α-tubulin acetyltransferase, reproduces the binding mode of peptidomimetic from X-ray structure (PDBID: 4PK3). © 2017 International Federation for Cell Biology.

  15. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE PAGES

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

    2015-10-26

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  16. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  17. A feature-based approach to modeling protein-protein interaction hot spots.

    PubMed

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-05-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.

  18. Structurally detailed coarse-grained model for Sec-facilitated co-translational protein translocation and membrane integration

    PubMed Central

    Miller, Thomas F.

    2017-01-01

    We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943

  19. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

    PubMed

    Masso, Majid; Vaisman, Iosif I

    2014-01-01

    The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

  20. LoopX: A Graphical User Interface-Based Database for Comprehensive Analysis and Comparative Evaluation of Loops from Protein Structures.

    PubMed

    Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna

    2017-10-01

    Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.

  1. Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER.

    PubMed

    Han, Xusi; Wei, Qing; Kihara, Daisuke

    2017-12-08

    With the rapid growth in the number of solved protein structures stored in the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB), it is essential to develop tools to perform real-time structure similarity searches against the entire structure database. Since conventional structure alignment methods need to sample different orientations of proteins in the three-dimensional space, they are time consuming and unsuitable for rapid, real-time database searches. To this end, we have developed 3D-SURFER and EM-SURFER, which utilize 3D Zernike descriptors (3DZD) to conduct high-throughput protein structure comparison, visualization, and analysis. Taking an atomic structure or an electron microscopy map of a protein or a protein complex as input, the 3DZD of a query protein is computed and compared with the 3DZD of all other proteins in PDB or EMDB. In addition, local geometrical characteristics of a query protein can be analyzed using VisGrid and LIGSITE CSC in 3D-SURFER. This article describes how to use 3D-SURFER and EM-SURFER to carry out protein surface shape similarity searches, local geometric feature analysis, and interpretation of the search results. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  2. Weak conservation of structural features in the interfaces of homologous transient protein–protein complexes

    PubMed Central

    Sudha, Govindarajan; Singh, Prashant; Swapna, Lakshmipuram S; Srinivasan, Narayanaswamy

    2015-01-01

    Residue types at the interface of protein–protein complexes (PPCs) are known to be reasonably well conserved. However, we show, using a dataset of known 3-D structures of homologous transient PPCs, that the 3-D location of interfacial residues and their interaction patterns are only moderately and poorly conserved, respectively. Another surprising observation is that a residue at the interface that is conserved is not necessarily in the interface in the homolog. Such differences in homologous complexes are manifested by substitution of the residues that are spatially proximal to the conserved residue and structural differences at the interfaces as well as differences in spatial orientations of the interacting proteins. Conservation of interface location and the interaction pattern at the core of the interfaces is higher than at the periphery of the interface patch. Extents of variability of various structural features reported here for homologous transient PPCs are higher than the variation in homologous permanent homomers. Our findings suggest that straightforward extrapolation of interfacial nature and inter-residue interaction patterns from template to target could lead to serious errors in the modeled complex structure. Understanding the evolution of interfaces provides insights to improve comparative modeling of PPC structures. PMID:26311309

  3. Fundamental Characteristics of AAA+ Protein Family Structure and Function

    PubMed Central

    2016-01-01

    Many complex cellular events depend on multiprotein complexes known as molecular machines to efficiently couple the energy derived from adenosine triphosphate hydrolysis to the generation of mechanical force. Members of the AAA+ ATPase superfamily (ATPases Associated with various cellular Activities) are critical components of many molecular machines. AAA+ proteins are defined by conserved modules that precisely position the active site elements of two adjacent subunits to catalyze ATP hydrolysis. In many cases, AAA+ proteins form a ring structure that translocates a polymeric substrate through the central channel using specialized loops that project into the central channel. We discuss the major features of AAA+ protein structure and function with an emphasis on pivotal aspects elucidated with archaeal proteins. PMID:27703410

  4. Evolutionarily Conserved Linkage between Enzyme Fold, Flexibility, and Catalysis

    PubMed Central

    Ramanathan, Arvind; Agarwal, Pratul K.

    2011-01-01

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme–substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme–substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design. PMID:22087074

  5. Evolutionarily conserved linkage between enzyme fold, flexibility, and catalysis.

    PubMed

    Ramanathan, Arvind; Agarwal, Pratul K

    2011-11-01

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme-substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme-substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ramanathan, Arvind; Agarwal, Pratul K

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function.more » Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.« less

  7. Role of indirect readout mechanism in TATA box binding protein-DNA interaction.

    PubMed

    Mondal, Manas; Choudhury, Devapriya; Chakrabarti, Jaydeb; Bhattacharyya, Dhananjay

    2015-03-01

    Gene expression generally initiates from recognition of TATA-box binding protein (TBP) to the minor groove of DNA of TATA box sequence where the DNA structure is significantly different from B-DNA. We have carried out molecular dynamics simulation studies of TBP-DNA system to understand how the DNA structure alters for efficient binding. We observed rigid nature of the protein while the DNA of TATA box sequence has an inherent flexibility in terms of bending and minor groove widening. The bending analysis of the free DNA and the TBP bound DNA systems indicate presence of some similar structures. Principal coordinate ordination analysis also indicates some structural features of the protein bound and free DNA are similar. Thus we suggest that the DNA of TATA box sequence regularly oscillates between several alternate structures and the one suitable for TBP binding is induced further by the protein for proper complex formation.

  8. Structure of the uncleaved ectodomain of the paramyxovirus (hPIV3) fusion protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yin, Hsien-Sheng; Paterson, Reay G.; Wen, Xiaolin

    2010-03-08

    Class I viral fusion proteins share common mechanistic and structural features but little sequence similarity. Structural insights into the protein conformational changes associated with membrane fusion are based largely on studies of the influenza virus hemagglutinin in pre- and postfusion conformations. Here, we present the crystal structure of the secreted, uncleaved ectodomain of the paramyxovirus, human parainfluenza virus 3 fusion (F) protein, a member of the class I viral fusion protein group. The secreted human parainfluenza virus 3 F forms a trimer with distinct head, neck, and stalk regions. Unexpectedly, the structure reveals a six-helix bundle associated with the postfusionmore » form of F, suggesting that the anchor-minus ectodomain adopts a conformation largely similar to the postfusion state. The transmembrane anchor domains of F may therefore profoundly influence the folding energetics that establish and maintain a metastable, prefusion state.« less

  9. PDB@: an offline toolkit for exploration and analysis of PDB files.

    PubMed

    Mani, Udayakumar; Ravisankar, Sadhana; Ramakrishnan, Sai Mukund

    2013-12-01

    Protein Data Bank (PDB) is a freely accessible archive of the 3-D structural data of biological molecules. Structure based studies offers a unique vantage point in inferring the properties of a protein molecule from structural data. This is too big a task to be done manually. Moreover, there is no single tool, software or server that comprehensively analyses all structure-based properties. The objective of the present work is to develop an offline computational toolkit, PDB@ containing in-built algorithms that help categorizing the structural properties of a protein molecule. The user has the facility to view and edit the PDB file to his need. Some features of the present work are unique in itself and others are an improvement over existing tools. Also, the representation of protein properties in both graphical and textual formats helps in predicting all the necessary details of a protein molecule on a single platform.

  10. PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine.

    PubMed

    Hayat, Maqsood; Tahir, Muhammad

    2015-08-01

    Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.

  11. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

    PubMed

    Zheng, Ce; Kurgan, Lukasz

    2008-10-10

    beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.

  12. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    PubMed Central

    Zheng, Ce; Kurgan, Lukasz

    2008-01-01

    Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492

  13. RNA helicase proteins as chaperones and remodelers

    PubMed Central

    Jarmoskaite, Inga; Russell, Rick

    2014-01-01

    Superfamily 2 helicase proteins are ubiquitous in RNA biology and have an extraordinarily broad set of functional roles. Central among these roles are to promote rearrangements of structured RNAs and to remodel RNA-protein complexes (RNPs), allowing formation of native RNA structure or progression through a functional cycle of structures. While all superfamily 2 helicases share a conserved helicase core, they are divided evolutionarily into several families, and it is principally proteins from three families, the DEAD-box, DEAH/RHA and Ski2-like families, that function to manipulate structured RNAs and RNPs. Strikingly, there are emerging differences in the mechanisms of these proteins, both between families and within the largest family (DEAD-box), and these differences appear to be tuned to their RNA or RNP substrates and their specific roles. This review outlines basic mechanistic features of the three families and surveys individual proteins and the current understanding of their biological substrates and mechanisms. PMID:24635478

  14. Protein remote homology detection based on bidirectional long short-term memory.

    PubMed

    Li, Shumin; Chen, Junjie; Liu, Bin

    2017-10-10

    Protein remote homology detection plays a vital role in studies of protein structures and functions. Almost all of the traditional machine leaning methods require fixed length features to represent the protein sequences. However, it is never an easy task to extract the discriminative features with limited knowledge of proteins. On the other hand, deep learning technique has demonstrated its advantage in automatically learning representations. It is worthwhile to explore the applications of deep learning techniques to the protein remote homology detection. In this study, we employ the Bidirectional Long Short-Term Memory (BLSTM) to learn effective features from pseudo proteins, also propose a predictor called ProDec-BLSTM: it includes input layer, bidirectional LSTM, time distributed dense layer and output layer. This neural network can automatically extract the discriminative features by using bidirectional LSTM and the time distributed dense layer. Experimental results on a widely-used benchmark dataset show that ProDec-BLSTM outperforms other related methods in terms of both the mean ROC and mean ROC50 scores. This promising result shows that ProDec-BLSTM is a useful tool for protein remote homology detection. Furthermore, the hidden patterns learnt by ProDec-BLSTM can be interpreted and visualized, and therefore, additional useful information can be obtained.

  15. Conserved Features in the Structure, Mechanism, and Biogenesis of the Inverse Autotransporter Protein Family

    PubMed Central

    Heinz, Eva; Stubenrauch, Christopher J.; Grinter, Rhys; Croft, Nathan P.; Purcell, Anthony W.; Strugnell, Richard A.; Dougan, Gordon; Lithgow, Trevor

    2016-01-01

    The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as “intimin-like” and “invasin” in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the “intimin/invasin protein family”. Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the “inverse autotransporter protein family” as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens. PMID:27190006

  16. The Widespread Prevalence and Functional Significance of Silk-Like Structural Proteins in Metazoan Biological Materials

    PubMed Central

    McDougall, Carmel; Woodcroft, Ben J.

    2016-01-01

    In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider. PMID:27415783

  17. Prediction of Peptide and Protein Propensity for Amyloid Formation

    PubMed Central

    Família, Carlos; Dennison, Sarah R.; Quintas, Alexandre; Phoenix, David A.

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation. PMID:26241652

  18. Predicting turns in proteins with a unified model.

    PubMed

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.

  19. Predicting Turns in Proteins with a Unified Model

    PubMed Central

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872

  20. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    PubMed

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

  1. Computational 3D structures of drug-targeting proteins in the 2009-H1N1 influenza A virus

    NASA Astrophysics Data System (ADS)

    Du, Qi-Shi; Wang, Shu-Qing; Huang, Ri-Bo; Chou, Kuo-Chen

    2010-01-01

    The neuraminidase (NA) and M2 proton channel of influenza virus are the drug-targeting proteins, based on which several drugs were developed. However these once powerful drugs encountered drug-resistant problem to the H5N1 and H1N1 flu. To address this problem, the computational 3D structures of NA and M2 proteins of 2009-H1N1 influenza virus were built using the molecular modeling technique and computational chemistry method. Based on the models the structure features of NA and M2 proteins were analyzed, the docking structures of drug-protein complexes were computed, and the residue mutations were annotated. The results may help to solve the drug-resistant problem and stimulate designing more effective drugs against 2009-H1N1 influenza pandemic.

  2. A Thermoacidophile-Specific Protein Family, DUF3211, Functions as a Fatty Acid Carrier with Novel Binding Mode

    PubMed Central

    Miyakawa, Takuya; Sawano, Yoriko; Miyazono, Ken-ichi; Miyauchi, Yumiko; Hatano, Ken-ichi

    2013-01-01

    STK_08120 is a member of the thermoacidophile-specific DUF3211 protein family from Sulfolobus tokodaii strain 7. Its molecular function remains obscure, and sequence similarities for obtaining functional remarks are not available. In this study, the crystal structure of STK_08120 was determined at 1.79-Å resolution to predict its probable function using structure similarity searches. The structure adopts an α/β structure of a helix-grip fold, which is found in the START domain proteins with cavities for hydrophobic substrates or ligands. The detailed structural features implied that fatty acids are the primary ligand candidates for STK_08120, and binding assays revealed that the protein bound long-chain saturated fatty acids (>C14) and their trans-unsaturated types with an affinity equal to that for major fatty acid binding proteins in mammals and plants. Moreover, the structure of an STK_08120-myristic acid complex revealed a unique binding mode among fatty acid binding proteins. These results suggest that the thermoacidophile-specific protein family DUF3211 functions as a fatty acid carrier with a novel binding mode. PMID:23836863

  3. Structural basis for the fast maturation of Arthropoda green fluorescent protein

    PubMed Central

    Evdokimov, Artem G; Pokross, Matthew E; Egorov, Nikolay S; Zaraisky, Andrey G; Yampolsky, Ilya V; Merzlyak, Ekaterina M; Shkoporov, Andrey N; Sander, Ian; Lukyanov, Konstantin A; Chudakov, Dmitriy M

    2006-01-01

    Since the cloning of Aequorea victoria green fluorescent protein (GFP) in 1992, a family of known GFP-like proteins has been growing rapidly. Today, it includes more than a hundred proteins with different spectral characteristics cloned from Cnidaria species. For some of these proteins, crystal structures have been solved, showing diversity in chromophore modifications and conformational states. However, we are still far from a complete understanding of the origin, functions and evolution of the GFP family. Novel proteins of the family were recently cloned from evolutionarily distant marine Copepoda species, phylum Arthropoda, demonstrating an extremely rapid generation of fluorescent signal. Here, we have generated a non-aggregating mutant of Copepoda fluorescent protein and solved its high-resolution crystal structure. It was found that the protein β-barrel contains a pore, leading to the chromophore. Using site-directed mutagenesis, we showed that this feature is critical for the fast maturation of the chromophore. PMID:16936637

  4. DSSR-enhanced visualization of nucleic acid structures in Jmol.

    PubMed

    Hanson, Robert M; Lu, Xiang-Jun

    2017-07-03

    Sophisticated and interactive visualizations are essential for making sense of the intricate 3D structures of macromolecules. For proteins, secondary structural components are routinely featured in molecular graphics visualizations. However, the field of RNA structural bioinformatics is still lagging behind; for example, current molecular graphics tools lack built-in support even for base pairs, double helices, or hairpin loops. DSSR (Dissecting the Spatial Structure of RNA) is an integrated and automated command-line tool for the analysis and annotation of RNA tertiary structures. It calculates a comprehensive and unique set of features for characterizing RNA, as well as DNA structures. Jmol is a widely used, open-source Java viewer for 3D structures, with a powerful scripting language. JSmol, its reincarnation based on native JavaScript, has a predominant position in the post Java-applet era for web-based visualization of molecular structures. The DSSR-Jmol integration presented here makes salient features of DSSR readily accessible, either via the Java-based Jmol application itself, or its HTML5-based equivalent, JSmol. The DSSR web service accepts 3D coordinate files (in mmCIF or PDB format) initiated from a Jmol or JSmol session and returns DSSR-derived structural features in JSON format. This seamless combination of DSSR and Jmol/JSmol brings the molecular graphics of 3D RNA structures to a similar level as that for proteins, and enables a much deeper analysis of structural characteristics. It fills a gap in RNA structural bioinformatics, and is freely accessible (via the Jmol application or the JSmol-based website http://jmol.x3dna.org). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine.

    PubMed

    Zhou, Wengang; Dickerson, Julie A

    2012-01-01

    Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

  6. Tissue-Engineered Nanofibrous Nerve Grafts for Enhancing the Rate of Nerve Regeneration

    DTIC Science & Technology

    2015-10-01

    structured nanofibrous biodegradable nerve graft system that present ECM protein, neurotrophic factor, and pre-seeded with bone marrow stromal cells in...nanofibrous biodegradable nerve graft system that present extracellular matrix (ECM) protein, nerve growth factor, and pre-seeded with bone marrow stromal...proposed novel structured nanofibrous biodegradable grafts will provide the micro environment, bioactivity, transport features and mechanics ideal for

  7. Functional correlation of bacterial LuxS with their quaternary associations: interface analysis of the structure networks

    PubMed Central

    Bhattacharyya, Moitrayee; Vishveshwara, Saraswathi

    2009-01-01

    Background The genome of a wide variety of prokaryotes contains the luxS gene homologue, which encodes for the protein S-ribosylhomocysteinelyase (LuxS). This protein is responsible for the production of the quorum sensing molecule, AI-2 and has been implicated in a variety of functions such as flagellar motility, metabolic regulation, toxin production and even in pathogenicity. A high structural similarity is present in the LuxS structures determined from a few species. In this study, we have modelled the structures from several other species and have investigated their dimer interfaces. We have attempted to correlate the interface features of LuxS with the phenotypic nature of the organisms. Results The protein structure networks (PSN) are constructed and graph theoretical analysis is performed on the structures obtained from X-ray crystallography and on the modelled ones. The interfaces, which are known to contain the active site, are characterized from the PSNs of these homodimeric proteins. The key features presented by the protein interfaces are investigated for the classification of the proteins in relation to their function. From our analysis, structural interface motifs are identified for each class in our dataset, which showed distinctly different pattern at the interface of LuxS for the probiotics and some extremophiles. Our analysis also reveals potential sites of mutation and geometric patterns at the interface that was not evident from conventional sequence alignment studies. Conclusion The structure network approach employed in this study for the analysis of dimeric interfaces in LuxS has brought out certain structural details at the side-chain interaction level, which were elusive from the conventional structure comparison methods. The results from this study provide a better understanding of the relation between the luxS gene and its functional role in the prokaryotes. This study also makes it possible to explore the potential direction towards the design of inhibitors of LuxS and thus towards a wide range of antimicrobials. PMID:19243584

  8. Crystal structure of the YGR205w protein from Saccharomyces cerevisiae: close structural resemblance to E. coli pantothenate kinase.

    PubMed

    Li de La Sierra-Gallay, Ines; Collinet, Bruno; Graille, Marc; Quevillon-Cheruel, Sophie; Liger, Dominique; Minard, Philippe; Blondeau, Karine; Henckes, Gilles; Aufrère, Robert; Leulliot, Nicolas; Zhou, Cong-Zhao; Sorel, Isabelle; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman

    2004-03-01

    The protein product of the YGR205w gene of Saccharomyces cerevisiae was targeted as part of our yeast structural genomics project. YGR205w codes for a small (290 amino acids) protein with unknown structure and function. The only recognizable sequence feature is the presence of a Walker A motif (P loop) indicating a possible nucleotide binding/converting function. We determined the three-dimensional crystal structure of Se-methionine substituted protein using multiple anomalous diffraction. The structure revealed a well known mononucleotide fold and strong resemblance to the structure of small metabolite phosphorylating enzymes such as pantothenate and phosphoribulo kinase. Biochemical experiments show that YGR205w binds specifically ATP and, less tightly, ADP. The structure also revealed the presence of two bound sulphate ions, occupying opposite niches in a canyon that corresponds to the active site of the protein. One sulphate is bound to the P-loop in a position that corresponds to the position of beta-phosphate in mononucleotide protein ATP complex, suggesting the protein is indeed a kinase. The nature of the phosphate accepting substrate remains to be determined. Copyright 2004 Wiley-Liss, Inc.

  9. Protein Aggregation/Folding: The Role of Deterministic Singularities of Sequence Hydrophobicity as Determined by Nonlinear Signal Analysis of Acylphosphatase and Aβ(1–40)

    PubMed Central

    Zbilut, Joseph P.; Colosimo, Alfredo; Conti, Filippo; Colafranceschi, Mauro; Manetti, Cesare; Valerio, MariaCristina; Webber, Charles L.; Giuliani, Alessandro

    2003-01-01

    The problem of protein folding vs. aggregation was investigated in acylphosphatase and the amyloid protein Aβ(1–40) by means of nonlinear signal analysis of their chain hydrophobicity. Numerical descriptors of recurrence patterns provided the basis for statistical evaluation of folding/aggregation distinctive features. Static and dynamic approaches were used to elucidate conditions coincident with folding vs. aggregation using comparisons with known protein secondary structure classifications, site-directed mutagenesis studies of acylphosphatase, and molecular dynamics simulations of amyloid protein, Aβ(1–40). The results suggest that a feature derived from principal component space characterized by the smoothness of singular, deterministic hydrophobicity patches plays a significant role in the conditions governing protein aggregation. PMID:14645049

  10. A first line of stress defense: small heat shock proteins and their function in protein homeostasis.

    PubMed

    Haslbeck, Martin; Vierling, Elizabeth

    2015-04-10

    Small heat shock proteins (sHsps) are virtually ubiquitous molecular chaperones that can prevent the irreversible aggregation of denaturing proteins. sHsps complex with a variety of non-native proteins in an ATP-independent manner and, in the context of the stress response, form a first line of defense against protein aggregation in order to maintain protein homeostasis. In vertebrates, they act to maintain the clarity of the eye lens, and in humans, sHsp mutations are linked to myopathies and neuropathies. Although found in all domains of life, sHsps are quite diverse and have evolved independently in metazoans, plants and fungi. sHsp monomers range in size from approximately 12 to 42kDa and are defined by a conserved β-sandwich α-crystallin domain, flanked by variable N- and C-terminal sequences. Most sHsps form large oligomeric ensembles with a broad distribution of different, sphere- or barrel-like oligomers, with the size and structure of the oligomers dictated by features of the N- and C-termini. The activity of sHsps is regulated by mechanisms that change the equilibrium distribution in tertiary features and/or quaternary structure of the sHsp ensembles. Cooperation and/or co-assembly between different sHsps in the same cellular compartment add an underexplored level of complexity to sHsp structure and function. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Heterochiral Knottin Protein: Folding and Solution Structure.

    PubMed

    Mong, Surin K; Cochran, Frank V; Yu, Hongtao; Graziano, Zachary; Lin, Yu-Shan; Cochran, Jennifer R; Pentelute, Bradley L

    2017-10-31

    Homochirality is a general feature of biological macromolecules, and Nature includes few examples of heterochiral proteins. Herein, we report on the design, chemical synthesis, and structural characterization of heterochiral proteins possessing loops of amino acids of chirality opposite to that of the rest of a protein scaffold. Using the protein Ecballium elaterium trypsin inhibitor II, we discover that selective β-alanine substitution favors the efficient folding of our heterochiral constructs. Solution nuclear magnetic resonance spectroscopy of one such heterochiral protein reveals a homogeneous global fold. Additionally, steered molecular dynamics simulation indicate β-alanine reduces the free energy required to fold the protein. We also find these heterochiral proteins to be more resistant to proteolysis than homochiral l-proteins. This work informs the design of heterochiral protein architectures containing stretches of both d- and l-amino acids.

  12. Protein Solubility and Protein Homeostasis: A Generic View of Protein Misfolding Disorders

    PubMed Central

    Vendruscolo, Michele; Knowles, Tuomas P.J.; Dobson, Christopher M.

    2011-01-01

    According to the “generic view” of protein aggregation, the ability to self-assemble into stable and highly organized structures such as amyloid fibrils is not an unusual feature exhibited by a small group of peptides and proteins with special sequence or structural properties, but rather a property shared by most proteins. At the same time, through a wide variety of techniques, many of which were originally devised for applications in other disciplines, it has also been established that the maintenance of proteins in a soluble state is a fundamental aspect of protein homeostasis. Taken together, these advances offer a unified framework for understanding the molecular basis of protein aggregation and for the rational development of therapeutic strategies based on the biological and chemical regulation of protein solubility. PMID:21825020

  13. Proteome-wide Prediction of Self-interacting Proteins Based on Multiple Properties*

    PubMed Central

    Liu, Zhongyang; Guo, Feifei; Zhang, Jiyang; Wang, Jian; Lu, Liang; Li, Dong; He, Fuchu

    2013-01-01

    Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This work helps us understand the role self-interacting proteins play in cellular functions from an overall perspective, and the constructed prediction model may contribute to the high throughput finding of self-interacting proteins and provide clues for elucidating their functions. PMID:23422585

  14. The triple helix of collagens - an ancient protein structure that enabled animal multicellularity and tissue evolution.

    PubMed

    Fidler, Aaron L; Boudko, Sergei P; Rokas, Antonis; Hudson, Billy G

    2018-04-09

    The cellular microenvironment, characterized by an extracellular matrix (ECM), played an essential role in the transition from unicellularity to multicellularity in animals (metazoans), and in the subsequent evolution of diverse animal tissues and organs. A major ECM component are members of the collagen superfamily -comprising 28 types in vertebrates - that exist in diverse supramolecular assemblies ranging from networks to fibrils. Each assembly is characterized by a hallmark feature, a protein structure called a triple helix. A current gap in knowledge is understanding the mechanisms of how the triple helix encodes and utilizes information in building scaffolds on the outside of cells. Type IV collagen, recently revealed as the evolutionarily most ancient member of the collagen superfamily, serves as an archetype for a fresh view of fundamental structural features of a triple helix that underlie the diversity of biological activities of collagens. In this Opinion, we argue that the triple helix is a protein structure of fundamental importance in building the extracellular matrix, which enabled animal multicellularity and tissue evolution. © 2018. Published by The Company of Biologists Ltd.

  15. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    PubMed

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  17. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  18. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  19. Quantitative theory of hydrophobic effect as a driving force of protein structure

    PubMed Central

    Perunov, Nikolay; England, Jeremy L

    2014-01-01

    Various studies suggest that the hydrophobic effect plays a major role in driving the folding of proteins. In the past, however, it has been challenging to translate this understanding into a predictive, quantitative theory of how the full pattern of sequence hydrophobicity in a protein shapes functionally important features of its tertiary structure. Here, we extend and apply such a phenomenological theory of the sequence-structure relationship in globular protein domains, which had previously been applied to the study of allosteric motion. In an effort to optimize parameters for the model, we first analyze the patterns of backbone burial found in single-domain crystal structures, and discover that classic hydrophobicity scales derived from bulk physicochemical properties of amino acids are already nearly optimal for prediction of burial using the model. Subsequently, we apply the model to studying structural fluctuations in proteins and establish a means of identifying ligand-binding and protein–protein interaction sites using this approach. PMID:24408023

  20. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  1. Sphingomyelin metabolism controls the shape and function of the Golgi cisternae

    PubMed Central

    Campelo, Felix; van Galen, Josse; Turacchio, Gabriele; Parashuraman, Seetharaman; Kozlov, Michael M; García-Parajo, María F; Malhotra, Vivek

    2017-01-01

    The flat Golgi cisterna is a highly conserved feature of eukaryotic cells, but how is this morphology achieved and is it related to its function in cargo sorting and export? A physical model of cisterna morphology led us to propose that sphingomyelin (SM) metabolism at the trans-Golgi membranes in mammalian cells essentially controls the structural features of a Golgi cisterna by regulating its association to curvature-generating proteins. An experimental test of this hypothesis revealed that affecting SM homeostasis converted flat cisternae into highly curled membranes with a concomitant dissociation of membrane curvature-generating proteins. These data lend support to our hypothesis that SM metabolism controls the structural organization of a Golgi cisterna. Together with our previously presented role of SM in controlling the location of proteins involved in glycosylation and vesicle formation, our data reveal the significance of SM metabolism in the structural organization and function of Golgi cisternae. DOI: http://dx.doi.org/10.7554/eLife.24603.001 PMID:28500756

  2. Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.

    PubMed

    Dominguez, Daniel; Freese, Peter; Alexis, Maria S; Su, Amanda; Hochman, Myles; Palden, Tsultrim; Bazile, Cassandra; Lambert, Nicole J; Van Nostrand, Eric L; Pratt, Gabriel A; Yeo, Gene W; Graveley, Brenton R; Burge, Christopher B

    2018-06-07

    RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Homology modeling a fast tool for drug discovery: current perspectives.

    PubMed

    Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.

  4. Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives

    PubMed Central

    Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616

  5. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  6. Protein Multifunctionality: Principles and Mechanisms

    PubMed Central

    Zaretsky, Joseph Z.; Wreschner, Daniel H.

    2008-01-01

    In the review, the nature of protein multifunctionality is analyzed. In the first part of the review the principles of structural/functional organization of protein are discussed. In the second part, the main mechanisms involved in development of multiple functions on a single gene product(s) are analyzed. The last part represents a number of examples showing that multifunctionality is a basic feature of biologically active proteins. PMID:21566747

  7. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    NASA Astrophysics Data System (ADS)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  8. Proteopedia: Exciting Advances in the 3D Encyclopedia of Biomolecular Structure

    NASA Astrophysics Data System (ADS)

    Prilusky, Jaime; Hodis, Eran; Sussman, Joel L.

    Proteopedia is a collaborative, 3D web-encyclopedia of protein, nucleic acid and other structures. Proteopedia ( http://www.proteopedia.org ) presents 3D biomolecule structures in a broadly accessible manner to a diverse scientific audience through easy-to-use molecular visualization tools integrated into a wiki environment that anyone with a user account can edit. We describe recent advances in the web resource in the areas of content and software. In terms of content, we describe a large growth in user-added content as well as improvements in automatically-generated content for all PDB entry pages in the resource. In terms of software, we describe new features ranging from the capability to create pages hidden from public view to the capability to export pages for offline viewing. New software features also include an improved file-handling system and availability of biological assemblies of protein structures alongside their asymmetric units.

  9. Protein–DNA Interactions: The Story so Far and a New Method for Prediction

    DOE PAGES

    Jones, Susan; Thornton, Janet M.

    2003-01-01

    This review describes methods for the prediction of DNA binding function, and specifically summarizes a new method using 3D structural templates. The new method features the HTH motif that is found in approximately one-third of DNAbinding protein families. A library of 3D structural templates of HTH motifs was derived from proteins in the PDB. Templates were scanned against complete protein structures and the optimal superposition of a template on a structure calculated. Significance thresholds in terms of a minimum root mean squared deviation (rmsd) of an optimal superposition, and a minimum motif accessible surface area (ASA), have been calculated. Inmore » this way, it is possible to scan the template library against proteins of unknown function to make predictions about DNA-binding functionality.« less

  10. Crystal structure of casein kinase-1, a phosphate-directed protein kinase.

    PubMed Central

    Xu, R M; Carmel, G; Sweet, R M; Kuret, J; Cheng, X

    1995-01-01

    The structure of a truncated variant of casein kinase-1 from Schizosaccharomyces pombe, has been determined in complex with MgATP at 2.0 A resolution. The model resembles the 'closed', ATP-bound conformations of the cyclin-dependent kinase 2 and the cAMP-dependent protein kinase, with clear differences in the structure of surface loops that impart unique features to casein kinase-1. The structure is of unphosphorylated, active conformation of casein kinase-1 and the peptide-binding site is fully accessible to substrate. Images PMID:7889932

  11. Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening

    NASA Astrophysics Data System (ADS)

    Zavodszky, Maria I.; Sanschagrin, Paul C.; Kuhn, Leslie A.; Korde, Rajesh S.

    2002-12-01

    For the successful identification and docking of new ligands to a protein target by virtual screening, the essential features of the protein and ligand surfaces must be captured and distilled in an efficient representation. Since the running time for docking increases exponentially with the number of points representing the protein and each ligand candidate, it is important to place these points where the best interactions can be made between the protein and the ligand. This definition of favorable points of interaction can also guide protein structure-based ligand design, which typically focuses on which chemical groups provide the most energetically favorable contacts. In this paper, we present an alternative method of protein template and ligand interaction point design that identifies the most favorable points for making hydrophobic and hydrogen-bond interactions by using a knowledge base. The knowledge-based protein and ligand representations have been incorporated in version 2.0 of SLIDE and resulted in dockings closer to the crystal structure orientations when screening a set of 57 known thrombin and glutathione S-transferase (GST) ligands against the apo structures of these proteins. There was also improved scoring enrichment of the dockings, meaning better differentiation between the chemically diverse known ligands and a ˜15,000-molecule dataset of randomly-chosen small organic molecules. This approach for identifying the most important points of interaction between proteins and their ligands can equally well be used in other docking and design techniques. While much recent effort has focused on improving scoring functions for protein-ligand docking, our results indicate that improving the representation of the chemistry of proteins and their ligands is another avenue that can lead to significant improvements in the identification, docking, and scoring of ligands.

  12. Drug Target Protein-Protein Interaction Networks: A Systematic Perspective

    PubMed Central

    2017-01-01

    The identification and validation of drug targets are crucial in biomedical research and many studies have been conducted on analyzing drug target features for getting a better understanding on principles of their mechanisms. But most of them are based on either strong biological hypotheses or the chemical and physical properties of those targets separately. In this paper, we investigated three main ways to understand the functional biomolecules based on the topological features of drug targets. There are no significant differences between targets and common proteins in the protein-protein interactions network, indicating the drug targets are neither hub proteins which are dominant nor the bridge proteins. According to some special topological structures of the drug targets, there are significant differences between known targets and other proteins. Furthermore, the drug targets mainly belong to three typical communities based on their modularity. These topological features are helpful to understand how the drug targets work in the PPI network. Particularly, it is an alternative way to predict potential targets or extract nontargets to test a new drug target efficiently and economically. By this way, a drug target's homologue set containing 102 potential target proteins is predicted in the paper. PMID:28691014

  13. Structure determination of a sugar-binding protein from the phytopathogenic bacterium Xanthomonas citri

    PubMed Central

    Medrano, Francisco Javier; de Souza, Cristiane Santos; Romero, Antonio; Balan, Andrea

    2014-01-01

    The uptake of maltose and related sugars in Gram-negative bacteria is mediated by an ABC transporter encompassing a periplasmic component (the maltose-binding protein or MalE), a pore-forming membrane protein (MalF and MalG) and a membrane-associated ATPase (MalK). In the present study, the structure determination of the apo form of the putative maltose/trehalose-binding protein (Xac-MalE) from the citrus pathogen Xanthomonas citri in space group P6522 is described. The crystals contained two protein molecules in the asymmetric unit and diffracted to 2.8 Å resolution. Xac-MalE conserves the structural and functional features of sugar-binding proteins and a ligand-binding pocket with similar characteristics to eight different orthologues, including the residues for maltose and trehalose interaction. This is the first structure of a sugar-binding protein from a phytopathogenic bacterium, which is highly conserved in all species from the Xanthomonas genus. PMID:24817711

  14. Defining an essence of structure determining residue contacts in proteins.

    PubMed

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.

  15. Defining an Essence of Structure Determining Residue Contacts in Proteins

    PubMed Central

    Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-01-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489

  16. G-LoSA: An efficient computational tool for local structure-centric biological studies and drug design.

    PubMed

    Lee, Hui Sun; Im, Wonpil

    2016-04-01

    Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Osipiuk, J.; Gornicki, P.; Maj, L.

    The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 Angstroms. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 Angstroms from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer {alpha}/{beta} sandwich with the overall shape of a cylinder and shows no structural homology to proteins of knownmore » structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the {alpha}-{beta} plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.« less

  18. Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold.

    PubMed

    Osipiuk, J; Górnicki, P; Maj, L; Dementieva, I; Laskowski, R; Joachimiak, A

    2001-11-01

    The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 A. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 A from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder and shows no structural homology to proteins of known structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the alpha-beta plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.

  19. Structure based re-design of the binding specificity of anti-apoptotic Bcl-xL

    PubMed Central

    Chen, T. Scott; Palacios, Hector; Keating, Amy E.

    2012-01-01

    Many native proteins are multi-specific and interact with numerous partners, which can confound analysis of their functions. Protein design provides a potential route to generating synthetic variants of native proteins with more selective binding profiles. Re-designed proteins could be used as research tools, diagnostics or therapeutics. In this work, we used a library screening approach to re-engineer the multi-specific anti-apoptotic protein Bcl-xL to remove its interactions with many of its binding partners, making it a high affinity and selective binder of the BH3 region of pro-apoptotic protein Bad. To overcome the enormity of the potential Bcl-xL sequence space, we developed and applied a computational/experimental framework that used protein structure information to generate focused combinatorial libraries. Sequence features were identified using structure-based modeling, and an optimization algorithm based on integer programming was used to select degenerate codons that maximally covered these features. A constraint on library size was used to ensure thorough sampling. Using yeast surface display to screen a designed library of Bcl-xL variants, we successfully identified a protein with ~1,000-fold improvement in binding specificity for the BH3 region of Bad over the BH3 region of Bim. Although negative design was targeted only against the BH3 region of Bim, the best re-designed protein was globally specific against binding to 10 other peptides corresponding to native BH3 motifs. Our design framework demonstrates an efficient route to highly specific protein binders and may readily be adapted for application to other design problems. PMID:23154169

  20. Automatic classification of protein structures using physicochemical parameters.

    PubMed

    Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam

    2014-09-01

    Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

  1. Protein disorder in the human diseasome: unfoldomics of human genetic diseases

    PubMed Central

    Midic, Uros; Oldfield, Christopher J; Dunker, A Keith; Obradovic, Zoran; Uversky, Vladimir N

    2009-01-01

    Background Intrinsically disordered proteins lack stable structure under physiological conditions, yet carry out many crucial biological functions, especially functions associated with regulation, recognition, signaling and control. Recently, human genetic diseases and related genes were organized into a bipartite graph (Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685–8690). This diseasome network revealed several significant features such as the common genetic origin of many diseases. Methods and findings We analyzed the abundance of intrinsic disorder in these diseasome network proteins by means of several prediction algorithms, and we analyzed the functional repertoires of these proteins based on prior studies relating disorder to function. Our analyses revealed that (i) Intrinsic disorder is common in proteins associated with many human genetic diseases; (ii) Different disease classes vary in the IDP contents of their associated proteins; (iii) Molecular recognition features, which are relatively short loosely structured protein regions within mostly disordered sequences and which gain structure upon binding to partners, are common in the diseasome, and their abundance correlates with the intrinsic disorder level; (iv) Some disease classes have a significant fraction of genes affected by alternative splicing, and the alternatively spliced regions in the corresponding proteins are predicted to be highly disordered; and (v) Correlations were found among the various diseasome graph-related properties and intrinsic disorder. Conclusion These observations provide the basis for the construction of the human-genetic-disease-associated unfoldome. PMID:19594871

  2. Mapping protein-RNA interactions by RCAP, RNA-cross-linking and peptide fingerprinting.

    PubMed

    Vaughan, Robert C; Kao, C Cheng

    2015-01-01

    RNA nanotechnology often feature protein RNA complexes. The interaction between proteins and large RNAs are difficult to study using traditional structure-based methods like NMR or X-ray crystallography. RCAP, an approach that uses reversible-cross-linking affinity purification method coupled with mass spectrometry, has been developed to map regions within proteins that contact RNA. This chapter details how RCAP is applied to map protein-RNA contacts within virions.

  3. An Algorithm for Protein Helix Assignment Using Helix Geometry

    PubMed Central

    Cao, Chen; Xu, Shutan; Wang, Lincong

    2015-01-01

    Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins. PMID:26132394

  4. Structural analyses of the CRISPR protein Csc2 reveal the RNA-binding interface of the type I-D Cas7 family.

    PubMed

    Hrle, Ajla; Maier, Lisa-Katharina; Sharma, Kundan; Ebert, Judith; Basquin, Claire; Urlaub, Henning; Marchfelder, Anita; Conti, Elena

    2014-01-01

    Upon pathogen invasion, bacteria and archaea activate an RNA-interference-like mechanism termed CRISPR (clustered regularly interspaced short palindromic repeats). A large family of Cas (CRISPR-associated) proteins mediates the different stages of this sophisticated immune response. Bioinformatic studies have classified the Cas proteins into families, according to their sequences and respective functions. These range from the insertion of the foreign genetic elements into the host genome to the activation of the interference machinery as well as target degradation upon attack. Cas7 family proteins are central to the type I and type III interference machineries as they constitute the backbone of the large interference complexes. Here we report the crystal structure of Thermofilum pendens Csc2, a Cas7 family protein of type I-D. We found that Csc2 forms a core RRM-like domain, flanked by three peripheral insertion domains: a lid domain, a Zinc-binding domain and a helical domain. Comparison with other Cas7 family proteins reveals a set of similar structural features both in the core and in the peripheral domains, despite the absence of significant sequence similarity. T. pendens Csc2 binds single-stranded RNA in vitro in a sequence-independent manner. Using a crosslinking - mass-spectrometry approach, we mapped the RNA-binding surface to a positively charged surface patch on T. pendens Csc2. Thus our analysis of the key structural and functional features of T. pendens Csc2 highlights recurring themes and evolutionary relationships in type I and type III Cas proteins.

  5. Fluid Mechanical Properties of Silkworm Fibroin Solutions

    NASA Astrophysics Data System (ADS)

    Matsumoto, Akira

    2005-11-01

    The aqueous solution behavior of silk fibroin is of interest due to the assembly and processing of this protein related to the spinning of protein fibers that exhibit remarkable mechanical properties. To gain insight into the origins of this functional feature, it is desired to determine how the protein behaves under a range of solution conditions. Pure fibroin at different concentrations in water was studied for surface tension, as a measure of surfactancy. In addition, shear induced changes on these solutions in terms of structure and morphology was also determined. Fibroin solutions exhibited shear rate-sensitive viscosity changes and precipitated at a critical shear rate where a dramatic increase of 75-150% of the initial value was observed along with a decrease in viscosity. In surface tension measurements, critical micelle concentrations were in the range of 3-4% w/v. The influence of additional factors, such as sericin protein, divalent and monovalent cations, and pH on the solution behavior in relation to structural and morphological features will also be described.

  6. Common features in the unfolding and misfolding of PDZ domains and beyond: the modulatory effect of domain swapping and extra-elements.

    PubMed

    Murciano-Calles, Javier; Güell-Bosch, Jofre; Villegas, Sandra; Martinez, Jose C

    2016-01-12

    PDZ domains are protein-protein interaction modules sharing the same structural arrangement. To discern whether they display common features in their unfolding/misfolding behaviour we have analyzed in this work the unfolding thermodynamics, together with the misfolding kinetics, of the PDZ fold using three archetypical examples: the second and third PDZ domains of the PSD95 protein and the Erbin PDZ domain. Results showed that all domains passed through a common intermediate, which populated upon unfolding, and that this in turn drove the misfolding towards worm-like fibrillar structures. Thus, the unfolding/misfolding behaviour appears to be shared within these domains. We have also analyzed how this landscape can be modified upon the inclusion of extra-elements, as it is in the nNOS PDZ domain, or the organization of swapped species, as happens in the second PDZ domain of the ZO2 protein. Although the intermediates still formed upon thermal unfolding, the misfolding was prevented to varying degrees.

  7. Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 4: intercellular bridges, mitochondria, nuclear envelope, apoptosis, ubiquitination, membrane/voltage-gated channels, methylation/acetylation, and transcription factors.

    PubMed

    Hermo, Louis; Pelletier, R-Marc; Cyr, Daniel G; Smith, Charles E

    2010-04-01

    As germ cells divide and differentiate from spermatogonia to spermatozoa, they share a number of structural and functional features that are common to all generations of germ cells and these features are discussed herein. Germ cells are linked to one another by large intercellular bridges which serve to move molecules and even large organelles from the cytoplasm of one cell to another. Mitochondria take on different shapes and features and topographical arrangements to accommodate their specific needs during spermatogenesis. The nuclear envelope and pore complex also undergo extensive modifications concomitant with the development of germ cell generations. Apoptosis is an event that is normally triggered by germ cells and involves many proteins. It occurs to limit the germ cell pool and acts as a quality control mechanism. The ubiquitin pathway comprises enzymes that ubiquitinate as well as deubiquitinate target proteins and this pathway is present and functional in germ cells. Germ cells express many proteins involved in water balance and pH control as well as voltage-gated ion channel movement. In the nucleus, proteins undergo epigenetic modifications which include methylation, acetylation, and phosphorylation, with each of these modifications signaling changes in chromatin structure. Germ cells contain specialized transcription complexes that coordinate the differentiation program of spermatogenesis, and there are many male germ cell-specific differences in the components of this machinery. All of the above features of germ cells will be discussed along with the specific proteins/genes and abnormalities to fertility related to each topic. Copyright 2009 Wiley-Liss, Inc.

  8. Unique Features of Halophilic Proteins.

    PubMed

    Arakawa, Tsutomu; Yamaguchi, Rui; Tokunaga, Hiroko; Tokunaga, Masao

    2017-01-01

    Proteins from moderate and extreme halophiles have unique characteristics. They are highly acidic and hydrophilic, similar to intrinsically disordered proteins. These characteristics make the halophilic proteins soluble in water and fold reversibly. In addition to reversible folding, the rate of refolding of halophilic proteins from denatured structure is generally slow, often taking several days, for example, for extremely halophilic proteins. This slow folding rate makes the halophilic proteins a novel model system for folding mechanism analysis. High solubility and reversible folding also make the halophilic proteins excellent fusion partners for soluble expression of recombinant proteins.

  9. Modulation of a Pore in the Capsid of JC Polyomavirus Reduces Infectivity and Prevents Exposure of the Minor Capsid Proteins

    PubMed Central

    Nelson, Christian D. S.; Ströh, Luisa J.; Gee, Gretchen V.; O'Hara, Bethany A.; Stehle, Thilo

    2015-01-01

    ABSTRACT JC polyomavirus (JCPyV) infection of immunocompromised individuals results in the fatal demyelinating disease progressive multifocal leukoencephalopathy (PML). The viral capsid of JCPyV is composed primarily of the major capsid protein virus protein 1 (VP1), and pentameric arrangement of VP1 monomers results in the formation of a pore at the 5-fold axis of symmetry. While the presence of this pore is conserved among polyomaviruses, its functional role in infection or assembly is unknown. Here, we investigate the role of the 5-fold pore in assembly and infection of JCPyV by generating a panel of mutant viruses containing amino acid substitutions of the residues lining this pore. Multicycle growth assays demonstrated that the fitness of all mutants was reduced compared to that of the wild-type virus. Bacterial expression of VP1 pentamers containing substitutions to residues lining the 5-fold pore did not affect pentamer assembly or prevent association with the VP2 minor capsid protein. The X-ray crystal structures of selected pore mutants contained subtle changes to the 5-fold pore, and no other changes to VP1 were observed. Pore mutant pseudoviruses were not deficient in assembly, packaging of the minor capsid proteins, or binding to cells or in transport to the host cell endoplasmic reticulum. Instead, these mutant viruses were unable to expose VP2 upon arrival to the endoplasmic reticulum, a step that is critical for infection. This study demonstrated that the 5-fold pore is an important structural feature of JCPyV and that minor modifications to this structure have significant impacts on infectious entry. IMPORTANCE JCPyV is an important human pathogen that causes a severe neurological disease in immunocompromised individuals. While the high-resolution X-ray structure of the major capsid protein of JCPyV has been solved, the importance of a major structural feature of the capsid, the 5-fold pore, remains poorly understood. This pore is conserved across polyomaviruses and suggests either that these viruses have limited structural plasticity in this region or that this pore is important in infection or assembly. Using a structure-guided mutational approach, we showed that modulation of this pore severely inhibits JCPyV infection. These mutants do not appear deficient in assembly or early steps in infectious entry and are instead reduced in their ability to expose a minor capsid protein in the host cell endoplasmic reticulum. Our work demonstrates that the 5-fold pore is an important structural feature for JCPyV. PMID:25609820

  10. GIRAF: a method for fast search and flexible alignment of ligand binding interfaces in proteins at atomic resolution

    PubMed Central

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Comparison and classification of protein structures are fundamental means to understand protein functions. Due to the computational difficulty and the ever-increasing amount of structural data, however, it is in general not feasible to perform exhaustive all-against-all structure comparisons necessary for comprehensive classifications. To efficiently handle such situations, we have previously proposed a method, now called GIRAF. We herein describe further improvements in the GIRAF protein structure search and alignment method. The GIRAF method achieves extremely efficient search of similar structures of ligand binding sites of proteins by exploiting database indexing of structural features of local coordinate frames. In addition, it produces refined atom-wise alignments by iterative applications of the Hungarian method to the bipartite graph defined for a pair of superimposed structures. By combining the refined alignments based on different local coordinate frames, it is made possible to align structures involving domain movements. We provide detailed accounts for the database design, the search and alignment algorithms as well as some benchmark results. PMID:27493524

  11. @TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.

    PubMed

    Pons, Jean-Luc; Labesse, Gilles

    2009-07-01

    @TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/

  12. Protein domain organisation: adding order.

    PubMed

    Kummerfeld, Sarah K; Teichmann, Sarah A

    2009-01-29

    Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and reverse orientation in different proteins relative to random graphs with identical degree distributions. While these features were statistically over-represented, they are still fairly rare. Looking in detail at the proteins involved, we found strong functional relationships within each cluster. In addition, the domains tended to be involved in protein-protein interaction and are able to function as independent structural units. A particularly striking example was the human Jak-STAT signalling pathway which makes use of a set of domains in a range of orders and orientations to provide nuanced signaling functionality. This illustrated the importance of functional and structural constraints (or lack thereof) on domain organisation.

  13. Four structural risk factors identify most fibril-forming kappa light chains.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, F. J.; Biosciences Division

    2000-09-01

    Antibody light chains (LCs) comprise the most structurally diverse family of proteins involved in amyloidosis. Many antibody LCs incorporate structural features that impair their stability and solubility, leading to their assembly into fibrils and to their subsequent pathological deposition when produced in excess during multiple myeloma and primary amyloidosis. The particular amino acid variations in antibody LCs that account for fibril formation and amyloidogenesis have not been identified. This study focuses on amyloidogenesis within the Kl family of human LCs. Reanalysis of the current database of primary structures of proteins from more than 100 patients who produced Kl LCS, 37more » of which were amyloidogenic, reveals apparent structural features that may contribute to amyloidosis. These features include loss of conserved residues or the gain of particular residues through mutation at sites involving a repertoire of approximately 20% of the amino acid positions in the light chain variable domain (V{sub L}). Moreover, 80% of all K1 amyloidogenic V{sub L}s are identifiable by the presence of at least one of three single-site substitutions or the acquisition of an N-linked glycosylation site through mutations. These findings suggest that it is feasible to predict fibril propensity by analysis of primary structure.« less

  14. Biological and functional relevance of CASP predictions

    PubMed Central

    Liu, Tianyun; Ish‐Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D.

    2017-01-01

    Abstract Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo‐sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo‐sites), and Ten sites containing important motifs, loops, or key residues with important disease‐associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best‐ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand‐binding sites, most prediction methods have higher performance on apo‐sites than holo‐sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein‐protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein‐protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. PMID:28975675

  15. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

    PubMed

    Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.

  16. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483

  17. Activity and conformation of lysozyme in molecular solvents, protic ionic liquids (PILs) and salt-water systems.

    PubMed

    Wijaya, Emmy C; Separovic, Frances; Drummond, Calum J; Greaves, Tamar L

    2016-09-21

    Improving protein stabilisation is important for the further development of many applications in the pharmaceutical, specialty chemical, consumer product and agricultural sectors. However, protein stabilization is highly dependent on the solvent environment and, hence, it is very complex to tailor protein-solvent combinations for stable protein maintenance. Understanding solvent features that govern protein stabilization will enable selection or design of suitable media with favourable solution environments to retain protein native conformation. In this work the structural conformation and activity of lysozyme in 29 solvent systems were investigated to determine the role of various solvent features on the stability of the enzyme. The solvent systems consisted of 19 low molecular weight polar solvents and 4 protic ionic liquids (PILs), both at different water content levels, and 6 aqueous salt solutions. Small angle X-ray scattering, Fourier transform infrared spectroscopy and UV-vis spectroscopy were used to investigate the tertiary and secondary structure of lysozyme along with the corresponding activity in various solvation systems. At low non-aqueous solvent concentrations (high water content), the presence of solvents and salts generally maintained lysozyme in its native structure and enhanced its activity. Due to the presence of a net surface charge on lysozyme, electrostatic interactions in PIL-water systems and salt solutions enhanced lysozyme activity more than the specific hydrogen-bond interactions present in non-ionic molecular solvents. At higher solvent concentrations (lower water content), solvents with a propensity to exhibit the solvophobic effect, analogous to the hydrophobic effect in water, retained lysozyme native conformation and activity. This solvophobic effect was observed particularly for solvents which contained hydroxyl moieties. Preferential solvophobic effects along with bulky chemical structures were postulated to result in less competition with water at the specific hydration layer around the protein, thus reducing protein-solvent interactions and retaining lysozyme's native conformation. The structure-property links established in this study are considered to be applicable to other proteins.

  18. VarMod: modelling the functional effects of non-synonymous variants.

    PubMed

    Pappalardo, Morena; Wass, Mark N

    2014-07-01

    Unravelling the genotype-phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein-protein interfaces and protein-ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. SMOG 2: A Versatile Software Package for Generating Structure-Based Models.

    PubMed

    Noel, Jeffrey K; Levi, Mariana; Raghunathan, Mohit; Lammert, Heiko; Hayes, Ryan L; Onuchic, José N; Whitford, Paul C

    2016-03-01

    Molecular dynamics simulations with coarse-grained or simplified Hamiltonians have proven to be an effective means of capturing the functionally important long-time and large-length scale motions of proteins and RNAs. Originally developed in the context of protein folding, structure-based models (SBMs) have since been extended to probe a diverse range of biomolecular processes, spanning from protein and RNA folding to functional transitions in molecular machines. The hallmark feature of a structure-based model is that part, or all, of the potential energy function is defined by a known structure. Within this general class of models, there exist many possible variations in resolution and energetic composition. SMOG 2 is a downloadable software package that reads user-designated structural information and user-defined energy definitions, in order to produce the files necessary to use SBMs with high performance molecular dynamics packages: GROMACS and NAMD. SMOG 2 is bundled with XML-formatted template files that define commonly used SBMs, and it can process template files that are altered according to the needs of each user. This computational infrastructure also allows for experimental or bioinformatics-derived restraints or novel structural features to be included, e.g. novel ligands, prosthetic groups and post-translational/transcriptional modifications. The code and user guide can be downloaded at http://smog-server.org/smog2.

  20. PDB2Graph: A toolbox for identifying critical amino acids map in proteins based on graph theory.

    PubMed

    Niknam, Niloofar; Khakzad, Hamed; Arab, Seyed Shahriar; Naderi-Manesh, Hossein

    2016-05-01

    The integrative and cooperative nature of protein structure involves the assessment of topological and global features of constituent parts. Network concept takes complete advantage of both of these properties in the analysis concomitantly. High compatibility to structural concepts or physicochemical properties in addition to exploiting a remarkable simplification in the system has made network an ideal tool to explore biological systems. There are numerous examples in which different protein structural and functional characteristics have been clarified by the network approach. Here, we present an interactive and user-friendly Matlab-based toolbox, PDB2Graph, devoted to protein structure network construction, visualization, and analysis. Moreover, PDB2Graph is an appropriate tool for identifying critical nodes involved in protein structural robustness and function based on centrality indices. It maps critical amino acids in protein networks and can greatly aid structural biologists in selecting proper amino acid candidates for manipulating protein structures in a more reasonable and rational manner. To introduce the capability and efficiency of PDB2Graph in detail, the structural modification of Calmodulin through allosteric binding of Ca(2+) is considered. In addition, a mutational analysis for three well-identified model proteins including Phage T4 lysozyme, Barnase and Ribonuclease HI, was performed to inspect the influence of mutating important central residues on protein activity. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Molecular-Scale Features that Govern the Effects of O-Glycosylation on a Carbohydrate-Binding Module

    DOE PAGES

    Guan, Xiaoyang; Chaffey, Patrick K.; Zeng, Chen; ...

    2015-09-21

    The protein glycosylation is a ubiquitous post-translational modification in all kingdoms of life. Despite its importance in molecular and cellular biology, the molecular-level ramifications of O-glycosylation on biomolecular structure and function remain elusive. Here, we took a small model glycoprotein and changed the glycan structure and size, amino acid residues near the glycosylation site, and glycosidic linkage while monitoring any corresponding changes to physical stability and cellulose binding affinity. The results of this study reveal the collective importance of all the studied features in controlling the most pronounced effects of O-glycosylation in this system. This study suggests the possibility ofmore » designing proteins with multiple improved properties by simultaneously varying the structures of O-glycans and amino acids local to the glycosylation site.« less

  2. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

    PubMed

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  3. 2.4 Å resolution crystal structure of human TRAP1NM, the Hsp90 paralog in the mitochondrial matrix.

    PubMed

    Sung, Nuri; Lee, Jungsoon; Kim, Ji Hyun; Chang, Changsoo; Tsai, Francis T F; Lee, Sukyeong

    2016-08-01

    TRAP1 is an organelle-specific Hsp90 paralog that is essential for neoplastic growth. As a member of the Hsp90 family, TRAP1 is presumed to be a general chaperone facilitating the late-stage folding of Hsp90 client proteins in the mitochondrial matrix. Interestingly, TRAP1 cannot replace cytosolic Hsp90 in protein folding, and none of the known Hsp90 co-chaperones are found in mitochondria. Thus, the three-dimensional structure of TRAP1 must feature regulatory elements that are essential to the ATPase activity and chaperone function of TRAP1. Here, the crystal structure of a human TRAP1NM dimer is presented, featuring an intact N-domain and M-domain structure, bound to adenosine 5'-β,γ-imidotriphosphate (ADPNP). The crystal structure together with epitope-mapping results shows that the TRAP1 M-domain loop 1 contacts the neighboring subunit and forms a previously unobserved third dimer interface that mediates the specific interaction with mitochondrial Hsp70.

  4. Solving the mystery of the internal structure of casein micelles.

    PubMed

    Ingham, B; Erlangga, G D; Smialowska, A; Kirby, N M; Wang, C; Matia-Merino, L; Haverkamp, R G; Carr, A J

    2015-04-14

    The interpretation of milk X-ray and neutron scattering data in relation to the internal structure of the casein micelle is an ongoing debate. We performed resonant X-ray scattering measurements on liquid milk and conclusively identified key scattering features, namely those corresponding to the size of and the distance between colloidal calcium phosphate particles. An X-ray scattering feature commonly assigned to the particle size is instead due to protein inhomogeneities.

  5. Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities

    PubMed Central

    Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

    2014-01-01

    Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327

  6. Functional evolution of PLP-dependent enzymes based on active-site structural similarities.

    PubMed

    Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

    2014-10-01

    Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.

  7. VarMod: modelling the functional effects of non-synonymous variants

    PubMed Central

    Pappalardo, Morena; Wass, Mark N.

    2014-01-01

    Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. PMID:24906884

  8. Optimization of protein-protein docking for predicting Fc-protein interactions.

    PubMed

    Agostino, Mark; Mancera, Ricardo L; Ramsland, Paul A; Fernández-Recio, Juan

    2016-11-01

    The antibody crystallizable fragment (Fc) is recognized by effector proteins as part of the immune system. Pathogens produce proteins that bind Fc in order to subvert or evade the immune response. The structural characterization of the determinants of Fc-protein association is essential to improve our understanding of the immune system at the molecular level and to develop new therapeutic agents. Furthermore, Fc-binding peptides and proteins are frequently used to purify therapeutic antibodies. Although several structures of Fc-protein complexes are available, numerous others have not yet been determined. Protein-protein docking could be used to investigate Fc-protein complexes; however, improved approaches are necessary to efficiently model such cases. In this study, a docking-based structural bioinformatics approach is developed for predicting the structures of Fc-protein complexes. Based on the available set of X-ray structures of Fc-protein complexes, three regions of the Fc, loosely corresponding to three turns within the structure, were defined as containing the essential features for protein recognition and used as restraints to filter the initial docking search. Rescoring the filtered poses with an optimal scoring strategy provided a success rate of approximately 80% of the test cases examined within the top ranked 20 poses, compared to approximately 20% by the initial unrestrained docking. The developed docking protocol provides a significant improvement over the initial unrestrained docking and will be valuable for predicting the structures of currently undetermined Fc-protein complexes, as well as in the design of peptides and proteins that target Fc. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Simultaneous optimization of biomolecular energy function on features from small molecules and macromolecules

    PubMed Central

    Park, Hahnbeom; Bradley, Philip; Greisen, Per; Liu, Yuan; Mulligan, Vikram Khipple; Kim, David E.; Baker, David; DiMaio, Frank

    2017-01-01

    Most biomolecular modeling energy functions for structure prediction, sequence design, and molecular docking, have been parameterized using existing macromolecular structural data; this contrasts molecular mechanics force fields which are largely optimized using small-molecule data. In this study, we describe an integrated method that enables optimization of a biomolecular modeling energy function simultaneously against small-molecule thermodynamic data and high-resolution macromolecular structural data. We use this approach to develop a next-generation Rosetta energy function that utilizes a new anisotropic implicit solvation model, and an improved electrostatics and Lennard-Jones model, illustrating how energy functions can be considerably improved in their ability to describe large-scale energy landscapes by incorporating both small-molecule and macromolecule data. The energy function improves performance in a wide range of protein structure prediction challenges, including monomeric structure prediction, protein-protein and protein-ligand docking, protein sequence design, and prediction of the free energy changes by mutation, while reasonably recapitulating small-molecule thermodynamic properties. PMID:27766851

  10. Web3DMol: interactive protein structure visualization based on WebGL.

    PubMed

    Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q

    2017-07-03

    A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

    PubMed Central

    Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

    2007-01-01

    SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171

  12. POOL server: machine learning application for functional site prediction in proteins.

    PubMed

    Somarowthu, Srinivas; Ondrechen, Mary Jo

    2012-08-01

    We present an automated web server for partial order optimum likelihood (POOL), a machine learning application that combines computed electrostatic and geometric information for high-performance prediction of catalytic residues from 3D structures. Input features consist of THEMATICS electrostatics data and pocket information from ConCavity. THEMATICS measures deviation from typical, sigmoidal titration behavior to identify functionally important residues and ConCavity identifies binding pockets by analyzing the surface geometry of protein structures. Both THEMATICS and ConCavity (structure only) do not require the query protein to have any sequence or structure similarity to other proteins. Hence, POOL is applicable to proteins with novel folds and engineered proteins. As an additional option for cases where sequence homologues are available, users can include evolutionary information from INTREPID for enhanced accuracy in site prediction. The web site is free and open to all users with no login requirements at http://www.pool.neu.edu. m.ondrechen@neu.edu Supplementary data are available at Bioinformatics online.

  13. KFC Server: interactive forecasting of protein interaction hot spots.

    PubMed

    Darnell, Steven J; LeGault, Laura; Mitchell, Julie C

    2008-07-01

    The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model-a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein-protein or protein-DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org.

  14. Carbohydrate-protein interactions: molecular modeling insights.

    PubMed

    Pérez, Serge; Tvaroška, Igor

    2014-01-01

    The article reviews the significant contributions to, and the present status of, applications of computational methods for the characterization and prediction of protein-carbohydrate interactions. After a presentation of the specific features of carbohydrate modeling, along with a brief description of the experimental data and general features of carbohydrate-protein interactions, the survey provides a thorough coverage of the available computational methods and tools. At the quantum-mechanical level, the use of both molecular orbitals and density-functional theory is critically assessed. These are followed by a presentation and critical evaluation of the applications of semiempirical and empirical methods: QM/MM, molecular dynamics, free-energy calculations, metadynamics, molecular robotics, and others. The usefulness of molecular docking in structural glycobiology is evaluated by considering recent docking- validation studies on a range of protein targets. The range of applications of these theoretical methods provides insights into the structural, energetic, and mechanistic facets that occur in the course of the recognition processes. Selected examples are provided to exemplify the usefulness and the present limitations of these computational methods in their ability to assist in elucidation of the structural basis underlying the diverse function and biological roles of carbohydrates in their dialogue with proteins. These test cases cover the field of both carbohydrate biosynthesis and glycosyltransferases, as well as glycoside hydrolases. The phenomenon of (macro)molecular recognition is illustrated for the interactions of carbohydrates with such proteins as lectins, monoclonal antibodies, GAG-binding proteins, porins, and viruses. © 2014 Elsevier Inc. All rights reserved.

  15. Hot spot of structural ambivalence in prion protein revealed by secondary structure principal component analysis.

    PubMed

    Yamamoto, Norifumi

    2014-08-21

    The conformational conversion of proteins into an aggregation-prone form is a common feature of various neurodegenerative disorders including Alzheimer's, Huntington's, Parkinson's, and prion diseases. In the early stage of prion diseases, secondary structure conversion in prion protein (PrP) causing β-sheet expansion facilitates the formation of a pathogenic isoform with a high content of β-sheets and strong aggregation tendency to form amyloid fibrils. Herein, we propose a straightforward method to extract essential information regarding the secondary structure conversion of proteins from molecular simulations, named secondary structure principal component analysis (SSPCA). The definite existence of a PrP isoform with an increased β-sheet structure was confirmed in a free-energy landscape constructed by mapping protein structural data into a reduced space according to the principal components determined by the SSPCA. We suggest a "spot" of structural ambivalence in PrP-the C-terminal part of helix 2-that lacks a strong intrinsic secondary structure, thus promoting a partial α-helix-to-β-sheet conversion. This result is important to understand how the pathogenic conformational conversion of PrP is initiated in prion diseases. The SSPCA has great potential to solve various challenges in studying highly flexible molecular systems, such as intrinsically disordered proteins, structurally ambivalent peptides, and chameleon sequences.

  16. The crystal structure of the Leishmania infantum Silent Information Regulator 2 related protein 1: Implications to protein function and drug design.

    PubMed

    Ronin, Céline; Costa, David Mendes; Tavares, Joana; Faria, Joana; Ciesielski, Fabrice; Ciapetti, Paola; Smith, Terry K; MacDougall, Jane; Cordeiro-da-Silva, Anabela; Pemberton, Iain K

    2018-01-01

    The de novo crystal structure of the Leishmania infantum Silent Information Regulator 2 related protein 1 (LiSir2rp1) has been solved at 1.99Å in complex with an acetyl-lysine peptide substrate. The structure is broadly commensurate with Hst2/SIRT2 proteins of yeast and human origin, reproducing many of the structural features common to these sirtuin deacetylases, including the characteristic small zinc-binding domain, and the larger Rossmann-fold domain involved in NAD+-binding interactions. The two domains are linked via a cofactor binding loop ordered in open conformation. The peptide substrate binds to the LiSir2rp1 protein via a cleft formed between the small and large domains, with the acetyl-lysine side chain inserting further into the resultant hydrophobic tunnel. Crystals were obtained only with recombinant LiSir2rp1 possessing an extensive internal deletion of a proteolytically-sensitive region unique to the sirtuins of kinetoplastid origin. Deletion of 51 internal amino acids (P253-E303) from LiSir2rp1 did not appear to alter peptide substrate interactions in deacetylation assays, but was indispensable to obtain crystals. Removal of this potentially flexible region, that otherwise extends from the classical structural elements of the Rossmann-fold, specifically the β8-β9 connector, appears to result in lower accumulation of the protein when expressed from episomal vectors in L. infantum SIR2rp1 single knockout promastigotes. The biological function of the large serine-rich insertion in kinetoplastid/trypanosomatid sirtuins, highlighted as a disordered region with strong potential for post-translational modification, remains unknown but may confer additional cellular functions that are distinct from their human counterparts. These unique molecular features, along with the resolution of the first kinetoplastid sirtuin deacetylase structure, present novel opportunities for drug design against a protein target previously established as essential to parasite survival and proliferation.

  17. Structural characterization of more potent alternatives to HAMLET, a tumoricidal complex of α-lactalbumin and oleic acid.

    PubMed

    Nemashkalova, Ekaterina L; Kazakov, Alexei S; Khasanova, Leysan M; Permyakov, Eugene A; Permyakov, Sergei E

    2013-09-10

    HAMLET is a complex of human α-lactalbumin (hLA) with oleic acid (OA) that kills various tumor cells and strains of Streptococcus pneumoniae. More potent protein-OA complexes were previously reported for bovine α-lactalbumin (bLA) and β-lactoglobulin (bLG), and pike parvalbumin (pPA), and here we explore their structural features. The concentration dependencies of the tryptophan fluorescence of hLA, bLA, and bLG complexes with OA reveal their disintegration at protein concentrations below the micromolar level. Chemical cross-linking experiments provide evidence that association with OA shifts the distribution of oligomeric forms of hLA, bLA, bLG, and pPA toward higher-order oligomers. This effect is confirmed for bLA and bLG using the dynamic light scattering method, while pPA is shown to associate with OA vesicles. Like hLA binding, OA binding increases the affinity of bLG for small unilamellar dipalmitoylphosphatidylcholine vesicles, while pPA efficiently binds to the vesicles irrespective of OA binding. The association of OA with bLG and pPA increases their α-helix and cross-β-sheet content and resistance to enzymatic proteolysis, which is indicative of OA-induced protein structuring. The lack of excess heat sorption during melting of bLG and pPA in complex with OA and the presence of a cooperative thermal transition at the level of their secondary structure suggest that the OA-bound forms of bLG and pPA lack a fixed tertiary structure but exhibit a continuous thermal transition. Overall, despite marked differences, the HAMLET-like complexes that were studied exhibit a common feature: a tendency toward protein oligomerization. Because OA-induced oligomerization has been reported for other proteins, this phenomenon is inherent to many proteins.

  18. Prediction of interface residue based on the features of residue interaction network.

    PubMed

    Jiao, Xiong; Ranganathan, Shoba

    2017-11-07

    Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.

    PubMed

    Ladunga, I

    1992-04-01

    The markedly nonuniform, even systematic distribution of sequences in the protein "universe" has been analyzed by methods of protein taxonomy. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of proteins that seem to be natural structural groupings, possibly seeds for a future protein taxonomy. The aim was not to force proteins into more or less man-made categories by discriminant analysis, but to find structurally similar groups, possibly of common evolutionary origin. Single-valued distance measures between pairs of superfamilies from the Protein Identification Resource were defined by two chi 2-like methods on tripeptide frequencies and the variable-length subsequence identity method derived from dot-matrix comparisons. Distance matrices were processed by several methods of cluster analysis to detect phylogenetic continuum between highly divergent proteins. Only well-defined clusters characterized by relatively unique structural, intracellular environmental, organismal, and functional attribute states were selected as major protein groups, including subsets of viral and Escherichia coli proteins, hormones, inhibitors, plant, ribosomal, serum and structural proteins, amino acid synthases, and clusters dominated by certain oxidoreductases and apolar and DNA-associated enzymes. The limited repertoire of functional patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle canalize certain proteins of viruses and Gram-negative bacteria, respectively, to organismal groups.

  20. Physical-chemical features of non-detergent sulfobetaines active as protein-folding helpers.

    PubMed

    Expert-Bezançon, Nicole; Rabilloud, Thierry; Vuillard, Laurent; Goldberg, Michel E

    2003-01-01

    Some non-detergent sulfobetaines had been shown to prevent aggregation and improve the yield of active proteins when added to the buffer during in vitro protein renaturation. With the aim of designing more efficient folding helpers, a series of non-detergent sulfobetaines have been synthesized and their efficiency in improving the renaturation of a variety of proteins (E. coli tryptophan synthase and beta-D-galactosidase, hen lysozyme, bovine serum albumin, a monoclonal antibody) have been investigated. Attempts to correlate the structure of each sulfobetaines with its effect on folding revealed some molecular features that appear important in helping renaturation. This enabled us to design and synthesize new non-detergent sulfobetaines that act as potent folding helpers.

  1. Structural studies of G protein-coupled receptors.

    PubMed

    Lu, Mengjie; Wu, Beili

    2016-11-01

    G protein-coupled receptors (GPCRs) comprise the largest membrane protein family. These receptors sense a variety of signaling molecules, activate multiple intracellular signal pathways, and act as the targets of over 40% of marketed drugs. Recent progress on GPCR structural studies provides invaluable insights into the structure-function relationship of the GPCR superfamily, deepening our understanding about the molecular mechanisms of GPCR signal transduction. Here, we review recent breakthroughs on GPCR structure determination and the structural features of GPCRs, and take the structures of chemokine receptor CCR5 and purinergic receptors P2Y 1 R and P2Y 12 R as examples to discuss the importance of GPCR structures on functional studies and drug discovery. In addition, we discuss the prospect of GPCR structure-based drug discovery. © 2016 IUBMB Life, 68(11):894-903, 2016. © 2016 International Union of Biochemistry and Molecular Biology.

  2. α-Crystallins Are Small Heat Shock Proteins: Functional and Structural Properties.

    PubMed

    Tikhomirova, T S; Selivanova, O M; Galzitskaya, O V

    2017-02-01

    During its life cycle, a cell can be subjected to various external negative effects. Many proteins provide cell protection, including small heat shock proteins (sHsp) that have chaperone-like activity. These proteins have several important functions involving prevention of apoptosis and retention of cytoskeletal integrity; also, sHsp take part in the recovery of enzyme activity. The action mechanism of sHsp is based on the binding of hydrophobic regions exposed to the surface of a molten globule. α-Crystallins presented in chordate cells as two αA- and αB-isoforms are the most studied small heat shock proteins. In this review, we describe the main functions of α-crystallins, features of their secondary and tertiary structures, and examples of their partners in protein-protein interactions.

  3. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207

  4. When a domain isn’t a domain, and why it’s important to properly filter proteins in databases

    PubMed Central

    Towse, Clare-Louise; Daggett, Valerie

    2013-01-01

    Summary Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our Consensus Domain Dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predicted estimates suggest 40% of proteins to have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets. PMID:23108912

  5. A 3D sequence-independent representation of the protein data bank.

    PubMed

    Fischer, D; Tsai, C J; Nussinov, R; Wolfson, H

    1995-10-01

    Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.

  6. STRUM: structure-based prediction of protein stability changes upon single-point mutation.

    PubMed

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-10-01

    Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. STRUM: structure-based prediction of protein stability changes upon single-point mutation

    PubMed Central

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-01-01

    Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206

  8. Arabidopsis thaliana telomeres exhibit euchromatic features

    PubMed Central

    Vaquero-Sedas, María I.; Gámez-Arjona, Francisco M.; Vega-Palas, Miguel A.

    2011-01-01

    Telomere function is influenced by chromatin structure and organization, which usually involves epigenetic modifications. We describe here the chromatin structure of Arabidopsis thaliana telomeres. Based on the study of six different epigenetic marks we show that Arabidopsis telomeres exhibit euchromatic features. In contrast, subtelomeric regions and telomeric sequences present at interstitial chromosomal loci are heterochromatic. Histone methyltransferases and the chromatin remodeling protein DDM1 control subtelomeric heterochromatin formation. Whereas histone methyltransferases are required for histone H3K92Me and non-CpG DNA methylation, DDM1 directs CpG methylation but not H3K92Me or non-CpG methylation. These results argue that both kinds of proteins participate in different pathways to reinforce subtelomeric heterochromatin formation. PMID:21071395

  9. A systematic analysis of atomic protein-ligand interactions in the PDB.

    PubMed

    Ferreira de Freitas, Renato; Schapira, Matthieu

    2017-10-01

    As the protein databank (PDB) recently passed the cap of 123 456 structures, it stands more than ever as an important resource not only to analyze structural features of specific biological systems, but also to study the prevalence of structural patterns observed in a large body of unrelated structures, that may reflect rules governing protein folding or molecular recognition. Here, we compiled a list of 11 016 unique structures of small-molecule ligands bound to proteins - 6444 of which have experimental binding affinity - representing 750 873 protein-ligand atomic interactions, and analyzed the frequency, geometry and impact of each interaction type. We find that hydrophobic interactions are generally enriched in high-efficiency ligands, but polar interactions are over-represented in fragment inhibitors. While most observations extracted from the PDB will be familiar to seasoned medicinal chemists, less expected findings, such as the high number of C-H···O hydrogen bonds or the relatively frequent amide-π stacking between the backbone amide of proteins and aromatic rings of ligands, uncover underused ligand design strategies.

  10. The CWB2 Cell Wall-Anchoring Module Is Revealed by the Crystal Structures of the Clostridium difficile Cell Wall Proteins Cwp8 and Cwp6.

    PubMed

    Usenik, Aleksandra; Renko, Miha; Mihelič, Marko; Lindič, Nataša; Borišek, Jure; Perdih, Andrej; Pretnar, Gregor; Müller, Uwe; Turk, Dušan

    2017-03-07

    Bacterial cell wall proteins play crucial roles in cell survival, growth, and environmental interactions. In Gram-positive bacteria, cell wall proteins include several types that are non-covalently attached via cell wall binding domains. Of the two conserved surface-layer (S-layer)-anchoring modules composed of three tandem SLH or CWB2 domains, the latter have so far eluded structural insight. The crystal structures of Cwp8 and Cwp6 reveal multi-domain proteins, each containing an embedded CWB2 module. It consists of a triangular trimer of Rossmann-fold CWB2 domains, a feature common to 29 cell wall proteins in Clostridium difficile 630. The structural basis of the intact module fold necessary for its binding to the cell wall is revealed. A comparison with previously reported atomic force microscopy data of S-layers suggests that C. difficile S-layers are complex oligomeric structures, likely composed of several different proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Structural Influence on the Dominance of Virus-Specific CD4 T Cell Epitopes in Zika Virus Infection.

    PubMed

    Koblischke, Maximilian; Stiasny, Karin; Aberle, Stephan W; Malafa, Stefan; Tschouchnikas, Georgios; Schwaiger, Julia; Kundi, Michael; Heinz, Franz X; Aberle, Judith H

    2018-01-01

    Zika virus (ZIKV) has recently caused explosive outbreaks in Pacific islands, South- and Central America. Like with other flaviviruses, protective immunity is strongly dependent on potently neutralizing antibodies (Abs) directed against the viral envelope protein E. Such Ab formation is promoted by CD4 T cells through direct interaction with B cells that present epitopes derived from E or other structural proteins of the virus. Here, we examined the extent and epitope dominance of CD4 T cell responses to capsid (C) and envelope proteins in Zika patients. All patients developed ZIKV-specific CD4 T cell responses, with substantial contributions of C and E. In both proteins, immunodominant epitopes clustered at sites that are structurally conserved among flaviviruses but have highly variable sequences, suggesting a strong impact of protein structural features on immunodominant CD4 T cell responses. Our data are particularly relevant for designing flavivirus vaccines and their evaluation in T cell assays and provide insights into the importance of viral protein structure for epitope selection and antigenicity.

  12. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  13. Comparative Analysis of the 15.5kD Box C/D snoRNP Core Protein in the Primitive Eukaryote Giardia lamblia Reveals Unique Structural and Functional Features

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biswas, Shyamasri; Buhrman, Greg; Gagnon, Keith

    2012-07-11

    Box C/D ribonucleoproteins (RNP) guide the 2'-O-methylation of targeted nucleotides in archaeal and eukaryotic rRNAs. The archaeal L7Ae and eukaryotic 15.5kD box C/D RNP core protein homologues initiate RNP assembly by recognizing kink-turn (K-turn) motifs. The crystal structure of the 15.5kD core protein from the primitive eukaryote Giardia lamblia is described here to a resolution of 1.8 {angstrom}. The Giardia 15.5kD protein exhibits the typical {alpha}-{beta}-{alpha} sandwich fold exhibited by both archaeal L7Ae and eukaryotic 15.5kD proteins. Characteristic of eukaryotic homologues, the Giardia 15.5kD protein binds the K-turn motif but not the variant K-loop motif. The highly conserved residues ofmore » loop 9, critical for RNA binding, also exhibit conformations similar to those of the human 15.5kD protein when bound to the K-turn motif. However, comparative sequence analysis indicated a distinct evolutionary position between Archaea and Eukarya. Indeed, assessment of the Giardia 15.5kD protein in denaturing experiments demonstrated an intermediate stability in protein structure when compared with that of the eukaryotic mouse 15.5kD and archaeal Methanocaldococcus jannaschii L7Ae proteins. Most notable was the ability of the Giardia 15.5kD protein to assemble in vitro a catalytically active chimeric box C/D RNP utilizing the archaeal M. jannaschii Nop56/58 and fibrillarin core proteins. In contrast, a catalytically competent chimeric RNP could not be assembled using the mouse 15.5kD protein. Collectively, these analyses suggest that the G. lamblia 15.5kD protein occupies a unique position in the evolution of this box C/D RNP core protein retaining structural and functional features characteristic of both archaeal L7Ae and higher eukaryotic 15.5kD homologues.« less

  14. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

    PubMed Central

    Pal Choudhury, Pabitra

    2017-01-01

    Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850

  15. Mining protein database using machine learning techniques.

    PubMed

    Camargo, Renata da Silva; Niranjan, Mahesan

    2008-08-25

    With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous.
    We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies.
    In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a \\"knowledge gap\\" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.

  16. The structure of people's hair.

    PubMed

    Yang, Fei-Chi; Zhang, Yuchen; Rheinstädter, Maikel C

    2014-01-01

    Hair is a filamentous biomaterial consisting mainly of proteins in particular keratin. The structure of human hair is well known: the medulla is a loosely packed, disordered region near the centre of the hair surrounded by the cortex, which contains the major part of the fibre mass, mainly consisting of keratin proteins and structural lipids. The cortex is surrounded by the cuticle, a layer of dead, overlapping cells forming a protective layer around the hair. The corresponding structures have been studied extensively using a variety of different techniques, such as light, electron and atomic force microscopes, and also X-ray diffraction. We were interested in the question how much the molecular hair structure differs from person to person, between male and female hair, hair of different appearances such as colour and waviness. We included hair from parent and child, identical and fraternal twins in the study to see if genetically similar hair would show similar structural features. The molecular structure of the hair samples was studied using high-resolution X-ray diffraction, which covers length scales from molecules up to the organization of secondary structures. Signals due to the coiled-coil phase of α-helical keratin proteins, intermediate keratin filaments in the cortex and from the lipid layers in the cell membrane complex were observed in the specimen of all individuals, with very small deviations. Despite the relatively small number of individuals (12) included in this study, some conclusions can be drawn. While the general features were observed in all individuals and the corresponding molecular structures were almost identical, additional signals were observed in some specimen and assigned to different types of lipids in the cell membrane complex. Genetics seem to play a role in this composition as identical patterns were observed in hair from father and daughter and identical twins, however, not for fraternal twins. Identification and characterization of these features is an important step towards the detection of abnormalities in the molecular structure of hair as a potential diagnostic tool for certain diseases.

  17. Discovering rules for protein-ligand specificity using support vector inductive logic programming.

    PubMed

    Kelley, Lawrence A; Shrimpton, Paul J; Muggleton, Stephen H; Sternberg, Michael J E

    2009-09-01

    Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling tasks, an accurate structural model does not necessarily tell us about function. Thus, there is a pressing need for high-throughput methods for determining function from structure. The spatial arrangement of key amino acids in a folded protein, on the surface or buried in clefts, is often the determinants of its biological function. A central aim of molecular biology is to understand the relationship between such substructures or surfaces and biological function, leading both to function prediction and to function design. We present a new general method for discovering the features of binding pockets that confer specificity for particular ligands. Using a recently developed machine-learning technique which couples the rule-discovery approach of inductive logic programming with the statistical learning power of support vector machines, we are able to discriminate, with high precision (90%) and recall (86%) between pockets that bind FAD and those that bind NAD on a large benchmark set given only the geometry and composition of the backbone of the binding pocket without the use of docking. In addition, we learn rules governing this specificity which can feed into protein functional design protocols. An analysis of the rules found suggests that key features of the binding pocket may be tied to conformational freedom in the ligand. The representation is sufficiently general to be applicable to any discriminatory binding problem. All programs and data sets are freely available to non-commercial users at http://www.sbg.bio.ic.ac.uk/svilp_ligand/.

  18. Biophysical characterization of the structural change of Nopp140, an intrinsically disordered protein, in the interaction with CK2α

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Na, Jung-Hyun; Biomedical Research Institute, Korea Institute of Science and Technology, Seoul 02792; Department of Chemistry and Nano Science, Ewha Womans University, Seoul 03760

    2016-08-19

    Nucleolar phosphoprotein 140 (Nopp140) is a nucleolar protein, more than 80% of which is disordered. Previous studies have shown that the C-terminal region of Nopp140 (residues 568–596) interacts with protein kinase CK2α, and inhibits the catalytic activity of CK2. Although the region of Nopp140 responsible for the interaction with CK2α was identified, the structural features and the effect of this interaction on the structure of Nopp140 have not been defined due to the difficulty of structural characterization of disordered protein. In this study, the disordered feature of Nopp140 and the effect of CK2α on the structure of Nopp140 were examinedmore » using single-molecule fluorescence resonance energy transfer (smFRET) and electron paramagnetic resonance (EPR). The interaction with CK2α was increased conformational rigidity of the CK2α-interacting region of Nopp140 (Nopp140C), suggesting that the disordered and flexible conformation of Nopp140C became more rigid conformation as it binds to CK2α. In addition, site specific spin labeling and EPR analysis confirmed that the residues 574–589 of Nopp140 are critical for binding to CK2α. Similar technical approaches can be applied to analyze the conformational changes in other IDPs during their interactions with binding partners. - Highlights: • Nopp140 is intrinsically disordered protein (IDP). • Conformation of Nopp140 became more rigid conformation due to interaction with CK2α. • smFRET and EPR could be applied to analyze the structural changes of IDPs.« less

  19. NMR studies of protein-nucleic acid interactions.

    PubMed

    Varani, Gabriele; Chen, Yu; Leeper, Thomas C

    2004-01-01

    Protein-DNA and protein-RNA complexes play key functional roles in every living organism. Therefore, the elucidation of their structure and dynamics is an important goal of structural and molecular biology. Nuclear magnetic resonance (NMR) studies of protein and nucleic acid complexes have common features with studies of protein-protein complexes: the interaction surfaces between the molecules must be carefully delineated, the relative orientation of the two species needs to be accurately and precisely determined, and close intermolecular contacts defined by nuclear Overhauser effects (NOEs) must be obtained. However, differences in NMR properties (e.g., chemical shifts) and biosynthetic pathways for sample productions generate important differences. Chemical shift differences between the protein and nucleic acid resonances can aid the NMR structure determination process; however, the relatively limited dispersion of the RNA ribose resonances makes the process of assigning intermolecular NOEs more difficult. The analysis of the resulting structures requires computational tools unique to nucleic acid interactions. This chapter summarizes the most important elements of the structure determination by NMR of protein-nucleic acid complexes and their analysis. The main emphasis is on recent developments (e.g., residual dipolar couplings and new Web-based analysis tools) that have facilitated NMR studies of these complexes and expanded the type of biological problems to which NMR techniques of structural elucidation can now be applied.

  20. Intrinsic disorder in pathogen effectors: protein flexibility as an evolutionary hallmark in a molecular arms race.

    PubMed

    Marín, Macarena; Uversky, Vladimir N; Ott, Thomas

    2013-09-01

    Effector proteins represent a refined mechanism of bacterial pathogens to overcome plants' innate immune systems. These modular proteins often manipulate host physiology by directly interfering with immune signaling of plant cells. Even if host cells have developed efficient strategies to perceive the presence of pathogenic microbes and to recognize intracellular effector activity, it remains an open question why only few effectors are recognized directly by plant resistance proteins. Based on in-silico genome-wide surveys and a reevaluation of published structural data, we estimated that bacterial effectors of phytopathogens are highly enriched in long-disordered regions (>50 residues). These structurally flexible segments have no secondary structure under physiological conditions but can fold in a stimulus-dependent manner (e.g., during protein-protein interactions). The high abundance of intrinsic disorder in effectors strongly suggests positive evolutionary selection of this structural feature and highlights the dynamic nature of these proteins. We postulate that such structural flexibility may be essential for (1) effector translocation, (2) evasion of the innate immune system, and (3) host function mimicry. The study of these dynamical regions will greatly complement current structural approaches to understand the molecular mechanisms of these proteins and may help in the prediction of new effectors.

  1. ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

    PubMed Central

    Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

    2009-01-01

    We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

  2. Selective binding of choline by a phosphate-coordination-based triple helicate featuring an aromatic box.

    PubMed

    Jia, Chuandong; Zuo, Wei; Yang, Dong; Chen, Yanming; Cao, Liping; Custelcean, Radu; Hostaš, Jiří; Hobza, Pavel; Glaser, Robert; Wang, Yao-Yu; Yang, Xiao-Juan; Wu, Biao

    2017-10-16

    In nature, proteins have evolved sophisticated cavities tailored for capturing target guests selectively among competitors of similar size, shape, and charge. The fundamental principles guiding the molecular recognition, such as self-assembly and complementarity, have inspired the development of biomimetic receptors. In the current work, we report a self-assembled triple anion helicate (host 2) featuring a cavity resembling that of the choline-binding protein ChoX, as revealed by crystal and density functional theory (DFT)-optimized structures, which binds choline in a unique dual-site-binding mode. This similarity in structure leads to a similarly high selectivity of host 2 for choline over its derivatives, as demonstrated by the NMR and fluorescence competition experiments. Furthermore, host 2 is able to act as a fluorescence displacement sensor for discriminating choline, acetylcholine, L-carnitine, and glycine betaine effectively.The choline-binding protein ChoX exhibits a synergistic dual-site binding mode that allows it to discriminate choline over structural analogues. Here, the authors design a biomimetic triple anion helicate receptor whose selectivity for choline arises from a similar binding mechanism.

  3. Solution structure of Syrian hamster prion protein rPrP(90-231).

    PubMed

    Liu, H; Farr-Jones, S; Ulyanov, N B; Llinas, M; Marqusee, S; Groth, D; Cohen, F E; Prusiner, S B; James, T L

    1999-04-27

    NMR has been used to refine the structure of Syrian hamster (SHa) prion protein rPrP(90-231), which is commensurate with the infectious protease-resistant core of the scrapie prion protein PrPSc. The structure of rPrP(90-231), refolded to resemble the normal cellular isoform PrPC spectroscopically and immunologically, has been studied using multidimensional NMR; initial results were published [James et al. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 10086-10091]. We now report refinement with better definition revealing important structural and dynamic features which can be related to biological observations pertinent to prion diseases. Structure refinement was based on 2778 unambiguously assigned nuclear Overhauser effect (NOE) connectivities, 297 ambiguous NOE restraints, and 63 scalar coupling constants (3JHNHa). The structure is represented by an ensemble of 25 best-scoring structures from 100 structures calculated using ARIA/X-PLOR and further refined with restrained molecular dynamics using the AMBER 4.1 force field with an explicit shell of water molecules. The rPrP(90-231) structure features a core domain (residues 125-228), with a backbone atomic root-mean-square deviation (RMSD) of 0.67 A, consisting of three alpha-helices (residues 144-154, 172-193, and 200-227) and two short antiparallel beta-strands (residues 129-131 and 161-163). The N-terminus (residues 90-119) is largely unstructured despite some sparse and weak medium-range NOEs implying the existence of bends or turns. The transition region between the core domain and flexible N-terminus, i.e., residues 113-128, consists of hydrophobic residues or glycines and does not adopt any regular secondary structure in aqueous solution. There are about 30 medium- and long-range NOEs within this hydrophobic cluster, so it clearly manifests structure. Multiple discrete conformations are evident, implying the possible existence of one or more metastable states, which may feature in conversion of PrPC to PrPSc. To obtain a more comprehensive picture of rPrP(90-231), dynamics have been studied using amide hydrogen-deuterium exchange and 15N NMR relaxation times (T1 and T2) and 15N{1H} NOE measurements. Comparison of the structure with previous reports suggests sequence-dependent features that may be reflected in a species barrier to prion disease transmission.

  4. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  5. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Protein–Mineral Interactions: Molecular Dynamics Simulations Capture Importance of Variations in Mineral Surface Composition and Structure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersen, Amity; Reardon, Patrick N.; Chacon, Stephany S.

    Molecular dynamics simulations, conventional and metadynamics, were performed to determine the interaction of model protein Gb1 over kaolinite (001), Na+-montmorillonite (001), Ca2+-montmorillonite (001), goethite (100), and Na+-birnessite (001) mineral surfaces. Gb1, a small (56 residue) protein with a well-characterized solution-state nuclear magnetic resonance (NMR) structure and having α-helix, four-fold β-sheet, and hydrophobic core features, is used as a model protein to study protein soil mineral interactions and gain insights on structural changes and potential degradation of protein. From our simulations, we observe little change to the hydrated Gb1 structure over the kaolinite, montmorillonite, and goethite surfaces relative to its solvatedmore » structure without these mineral surfaces present. Over the Na+-birnessite basal surface, however, the Gb1 structure is highly disturbed as a result of interaction with this birnessite surface. Unraveling of the Gb1 β-sheet at specific turns and a partial unraveling of the α-helix is observed over birnessite, which suggests specific vulnerable residue sites for oxidation or hydrolysis possibly leading to fragmentation.« less

  7. Modeling Protein Expression and Protein Signaling Pathways

    PubMed Central

    Telesca, Donatello; Müller, Peter; Kornblau, Steven M.; Suchard, Marc A.; Ji, Yuan

    2015-01-01

    High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case, inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse-phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant subpathways in relation to the unfolding of the biological process under study. PMID:26246646

  8. The RCSB protein data bank: integrative view of protein, gene and 3D structural information

    PubMed Central

    Rose, Peter W.; Prlić, Andreas; Altunkaya, Ali; Bi, Chunxiao; Bradley, Anthony R.; Christie, Cole H.; Costanzo, Luigi Di; Duarte, Jose M.; Dutta, Shuchismita; Feng, Zukang; Green, Rachel Kramer; Goodsell, David S.; Hudson, Brian; Kalro, Tara; Lowe, Robert; Peisach, Ezra; Randle, Christopher; Rose, Alexander S.; Shao, Chenghua; Tao, Yi-Ping; Valasatava, Yana; Voigt, Maria; Westbrook, John D.; Woo, Jesse; Yang, Huangwang; Young, Jasmine Y.; Zardecki, Christine; Berman, Helen M.; Burley, Stephen K.

    2017-01-01

    The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a ‘Structural View of Biology.’ Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive. PMID:27794042

  9. A Comparative Study of Human Saposins.

    PubMed

    Garrido-Arandia, María; Cuevas-Zuviría, Bruno; Díaz-Perales, Araceli; Pacios, Luis F

    2018-02-14

    Saposins are small proteins implicated in trafficking and loading of lipids onto Cluster of Differentiation 1 (CD1) receptor proteins that in turn present lipid antigens to T cells and a variety of T-cell receptors, thus playing a crucial role in innate and adaptive immune responses in humans. Despite their low sequence identity, the four types of human saposins share a similar folding pattern consisting of four helices linked by three conserved disulfide bridges. However, their lipid-binding abilities as well as their activities in extracting, transporting and loading onto CD1 molecules a variety of sphingo- and phospholipids in biological membranes display two striking characteristics: a strong pH-dependence and a structural change between a compact, closed conformation and an open conformation. In this work, we present a comparative computational study of structural, electrostatic, and dynamic features of human saposins based upon their available experimental structures. By means of structural alignments, surface analyses, calculation of pH-dependent protonation states, Poisson-Boltzmann electrostatic potentials, and molecular dynamics simulations at three pH values representative of biological media where saposins fulfill their function, our results shed light into their intrinsic features. The similarities and differences in this class of proteins depend on tiny variations of local structural details that allow saposins to be key players in triggering responses in the human immune system.

  10. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    PubMed Central

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  11. Evolution of plant cell wall: Arabinogalactan-proteins from three moss genera show structural differences compared to seed plants.

    PubMed

    Bartels, Desirée; Baumann, Alexander; Maeder, Malte; Geske, Thomas; Heise, Esther Marie; von Schwartzenberg, Klaus; Classen, Birgit

    2017-05-01

    Arabinogalactan-proteins (AGPs) are important proteoglycans of plant cell walls. They seem to be present in most, if not all seed plants, but their occurrence and structure in bryophytes is widely unknown and actually the focus of AGP research. With regard to evolution of plant cell wall, we isolated AGPs from the three mosses Sphagnum sp., Physcomitrella patens and Polytrichastrum formosum. The moss AGPs show structural characteristics common for AGPs of seed plants, but also unique features, especially 3-O-methyl-rhamnose (trivial name acofriose) as terminal monosaccharide not found in arabinogalactan-proteins of angiosperms and 1,2,3-linked galactose as branching point never found in arabinogalactan-proteins before. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. In Silico Analysis of the Structural and Biochemical Features of the NMD Factor UPF1 in Ustilago maydis.

    PubMed

    Martínez-Montiel, Nancy; Morales-Lara, Laura; Hernández-Pérez, Julio M; Martínez-Contreras, Rebeca D

    2016-01-01

    The molecular mechanisms regulating the accuracy of gene expression are still not fully understood. Among these mechanisms, Nonsense-mediated Decay (NMD) is a quality control process that detects post-transcriptionally abnormal transcripts and leads them to degradation. The UPF1 protein lays at the heart of NMD as shown by several structural and functional features reported for this factor mainly for Homo sapiens and Saccharomyces cerevisiae. This process is highly conserved in eukaryotes but functional diversity can be observed in various species. Ustilago maydis is a basidiomycete and the best-known smut, which has become a model to study molecular and cellular eukaryotic mechanisms. In this study, we performed in silico analysis to investigate the structural and biochemical properties of the putative UPF1 homolog in Ustilago maydis. The putative homolog for UPF1 was recognized in the annotated genome for the basidiomycete, exhibiting 66% identity with its human counterpart at the protein level. The known structural and functional domains characteristic of UPF1 homologs were also found. Based on the crystal structures available for UPF1, we constructed different three-dimensional models for umUPF1 in order to analyze the secondary and tertiary structural features of this factor. Using these models, we studied the spatial arrangement of umUPF1 and its capability to interact with UPF2. Moreover, we identified the critical amino acids that mediate the interaction of umUPF1 with UPF2, ATP, RNA and with UPF1 itself. Mutating these amino acids in silico showed an important effect over the native structure. Finally, we performed molecular dynamic simulations for UPF1 proteins from H. sapiens and U. maydis and the results obtained show a similar behavior and physicochemical properties for the protein in both organisms. Overall, our results indicate that the putative UPF1 identified in U. maydis shows a very similar sequence, structural organization, mechanical stability, physicochemical properties and spatial organization in comparison to the NMD factor depicted for Homo sapiens. These observations strongly support the notion that human and fungal UPF1 could perform equivalent biological activities.

  13. Legume Lectins: Proteins with Diverse Applications

    PubMed Central

    Lagarda-Diaz, Irlanda; Guzman-Partida, Ana Maria; Vazquez-Moreno, Luz

    2017-01-01

    Lectins are a diverse class of proteins distributed extensively in nature. Among these proteins; legume lectins display a variety of interesting features including antimicrobial; insecticidal and antitumor activities. Because lectins recognize and bind to specific glycoconjugates present on the surface of cells and intracellular structures; they can serve as potential target molecules for developing practical applications in the fields of food; agriculture; health and pharmaceutical research. This review presents the current knowledge of the main structural characteristics of legume lectins and the relationship of structure to the exhibited specificities; provides an overview of their particular antimicrobial; insecticidal and antitumor biological activities and describes possible applications based on the pattern of recognized glyco-targets. PMID:28604616

  14. Fast iodide-SAD phasing for high-throughput membrane protein structure determination

    PubMed Central

    Melnikov, Igor; Polovinkin, Vitaly; Kovalev, Kirill; Gushchin, Ivan; Shevtsov, Mikhail; Shevchenko, Vitaly; Mishin, Alexey; Alekseev, Alexey; Rodriguez-Valera, Francisco; Borshchevskiy, Valentin; Cherezov, Vadim; Leonard, Gordon A.; Gordeliy, Valentin; Popov, Alexander

    2017-01-01

    We describe a fast, easy, and potentially universal method for the de novo solution of the crystal structures of membrane proteins via iodide–single-wavelength anomalous diffraction (I-SAD). The potential universality of the method is based on a common feature of membrane proteins—the availability at the hydrophobic-hydrophilic interface of positively charged amino acid residues with which iodide strongly interacts. We demonstrate the solution using I-SAD of four crystal structures representing different classes of membrane proteins, including a human G protein–coupled receptor (GPCR), and we show that I-SAD can be applied using data collection strategies based on either standard or serial x-ray crystallography techniques. PMID:28508075

  15. PDBe: towards reusable data delivery infrastructure at protein data bank in Europe

    PubMed Central

    Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J

    2018-01-01

    Abstract The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API. PMID:29126160

  16. Molecular Precision at Micrometer Length Scales: Hierarchical Assembly of DNA-Protein Nanostructures.

    PubMed

    Schiffels, Daniel; Szalai, Veronika A; Liddle, J Alexander

    2017-07-25

    Robust self-assembly across length scales is a ubiquitous feature of biological systems but remains challenging for synthetic structures. Taking a cue from biology-where disparate molecules work together to produce large, functional assemblies-we demonstrate how to engineer microscale structures with nanoscale features: Our self-assembly approach begins by using DNA polymerase to controllably create double-stranded DNA (dsDNA) sections on a single-stranded template. The single-stranded DNA (ssDNA) sections are then folded into a mechanically flexible skeleton by the origami method. This process simultaneously shapes the structure at the nanoscale and directs the large-scale geometry. The DNA skeleton guides the assembly of RecA protein filaments, which provides rigidity at the micrometer scale. We use our modular design strategy to assemble tetrahedral, rectangular, and linear shapes of defined dimensions. This method enables the robust construction of complex assemblies, greatly extending the range of DNA-based self-assembly methods.

  17. An estimated 5% of new protein structures solved today represent a new Pfam family

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mistry, Jaina; Kloppmann, Edda; Rost, Burkhard

    2013-11-01

    This study uses the Pfam database to show that the sequence redundancy of protein structures deposited in the PDB is increasing. The possible reasons behind this trend are discussed. High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquiredmore » their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.« less

  18. 3D Complex: A Structural Classification of Protein Complexes

    PubMed Central

    Levy, Emmanuel D; Pereira-Leal, Jose B; Chothia, Cyrus; Teichmann, Sarah A

    2006-01-01

    Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. PMID:17112313

  19. New Protein Mimetics: The Zinc Finger Motif as a Locked-In Tertiary Fold.

    PubMed

    Tuchscherer, Gabriele; Lehmann, Christian; Mathieu, Marc

    1998-11-16

    The principle of a molecular kit is used for the covalent assembly of secondary structure forming peptide blocks to predetermined packing topologies. The resulting locked-in folds (LIFs; depicted schematically) are readily accessible and bypass the intriguing folding problem of linear peptide chains. This strategy allows, for example, mimicking of the essential structural and functional features of zinc finger proteins. © 1998 WILEY-VCH Verlag GmbH, Weinheim, Fed. Rep. of Germany.

  20. Synchrotron IR microspectroscopy for protein structure analysis: Potential and questions

    DOE PAGES

    Yu, Peiqiang

    2006-01-01

    Synchrotron radiation-based Fourier transform infrared microspectroscopy (S-FTIR) has been developed as a rapid, direct, non-destructive, bioanalytical technique. This technique takes advantage of synchrotron light brightness and small effective source size and is capable of exploring the molecular chemical make-up within microstructures of a biological tissue without destruction of inherent structures at ultra-spatial resolutions within cellular dimension. To date there has been very little application of this advanced technique to the study of pure protein inherent structure at a cellular level in biological tissues. In this review, a novel approach was introduced to show the potential of the newly developed, advancedmore » synchrotron-based analytical technology, which can be used to localize relatively “pure“ protein in the plant tissues and relatively reveal protein inherent structure and protein molecular chemical make-up within intact tissue at cellular and subcellular levels. Several complex protein IR spectra data analytical techniques (Gaussian and Lorentzian multi-component peak modeling, univariate and multivariate analysis, principal component analysis (PCA), and hierarchical cluster analysis (CLA) are employed to relatively reveal features of protein inherent structure and distinguish protein inherent structure differences between varieties/species and treatments in plant tissues. By using a multi-peak modeling procedure, RELATIVE estimates (but not EXACT determinations) for protein secondary structure analysis can be made for comparison purpose. The issues of pro- and anti-multi-peaking modeling/fitting procedure for relative estimation of protein structure were discussed. By using the PCA and CLA analyses, the plant molecular structure can be qualitatively separate one group from another, statistically, even though the spectral assignments are not known. The synchrotron-based technology provides a new approach for protein structure research in biological tissues at ultraspatial resolutions.« less

  1. Predicting beta-turns in proteins using support vector machines with fractional polynomials

    PubMed Central

    2013-01-01

    Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438

  2. Predicting beta-turns in proteins using support vector machines with fractional polynomials.

    PubMed

    Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng

    2013-11-07

    β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

  3. Characterization and crystal structure of lysine insensitive Corynebacterium glutamicum dihydrodipicolinate synthase (cDHDPS) protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rice, E.A.; Bannon, G.A.; Glenn, K.C.

    2008-11-21

    The lysine insensitive Corynebacterium glutamicum dihydrodipicolinate synthase enzyme (cDHDPS) was recently successfully introduced into maize plants to enhance the level of lysine in the grain. To better understand lysine insensitivity of the cDHDPS, we expressed, purified, kinetically characterized the protein, and solved its X-ray crystal structure. The cDHDPS enzyme has a fold and overall structure that is highly similar to other DHDPS proteins. A noteworthy feature of the active site is the evidence that the catalytic lysine residue forms a Schiff base adduct with pyruvate. Analyses of the cDHDPS structure in the vicinity of the putative binding site for S-lysinemore » revealed that the allosteric binding site in the Escherichia coli DHDPS protein does not exist in cDHDPS due to three non-conservative amino acids substitutions, and this is likely why cDHDPS is not feedback inhibited by lysine.« less

  4. Recent developments in the theory of protein folding: searching for the global energy minimum.

    PubMed

    Scheraga, H A

    1996-04-16

    Statistical mechanical theories and computer simulation are being used to gain an understanding of the fundamental features of protein folding. A major obstacle in the computation of protein structures is the multiple-minima problem arising from the existence of many local minima in the multidimensional energy landscape of the protein. This problem has been surmounted for small open-chain and cyclic peptides, and for regular-repeating sequences of models of fibrous proteins. Progress is being made in resolving this problem for globular proteins.

  5. Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.

    PubMed

    Manrubia, Susanna; Cuesta, José A

    2017-04-01

    An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).

  6. TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    PubMed Central

    Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya

    2012-01-01

    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565

  7. Prediction of homoprotein and heteroprotein complexes by protein docking and template‐based modeling: A CASP‐CAPRI experiment

    PubMed Central

    Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen‐You; Schneidman‐Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez‐Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan‐Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie‐Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben‐Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung‐Rae; Roy, Amit; Han, Xusi; Esquivel‐Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero‐Durana, Miguel; Jiménez‐García, Brian; Moal, Iain H.; Férnandez‐Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey

    2016-01-01

    ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27122118

  8. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

    PubMed

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.

  9. The structure of the casein micelle of milk and its changes during processing.

    PubMed

    Dalgleish, Douglas G; Corredig, Milena

    2012-01-01

    The majority of the protein in cow's milk is contained in the particles known as casein micelles. This review describes the main structural features of these particles and the different models that have been used to define the interior structures. The reactions of the micelles during processing operations are described in terms of the structural models.

  10. Improved data visualization techniques for analyzing macromolecule structural changes.

    PubMed

    Kim, Jae Hyun; Iyer, Vidyashankara; Joshi, Sangeeta B; Volkin, David B; Middaugh, C Russell

    2012-10-01

    The empirical phase diagram (EPD) is a colored representation of overall structural integrity and conformational stability of macromolecules in response to various environmental perturbations. Numerous proteins and macromolecular complexes have been analyzed by EPDs to summarize results from large data sets from multiple biophysical techniques. The current EPD method suffers from a number of deficiencies including lack of a meaningful relationship between color and actual molecular features, difficulties in identifying contributions from individual techniques, and a limited ability to be interpreted by color-blind individuals. In this work, three improved data visualization approaches are proposed as techniques complementary to the EPD. The secondary, tertiary, and quaternary structural changes of multiple proteins as a function of environmental stress were first measured using circular dichroism, intrinsic fluorescence spectroscopy, and static light scattering, respectively. Data sets were then visualized as (1) RGB colors using three-index EPDs, (2) equiangular polygons using radar charts, and (3) human facial features using Chernoff face diagrams. Data as a function of temperature and pH for bovine serum albumin, aldolase, and chymotrypsin as well as candidate protein vaccine antigens including a serine threonine kinase protein (SP1732) and surface antigen A (SP1650) from S. pneumoniae and hemagglutinin from an H1N1 influenza virus are used to illustrate the advantages and disadvantages of each type of data visualization technique. Copyright © 2012 The Protein Society.

  11. Impact of mutations on the allosteric conformational equilibrium

    PubMed Central

    Weinkam, Patrick; Chen, Yao Chi; Pons, Jaume; Sali, Andrej

    2012-01-01

    Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction. PMID:23228330

  12. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    PubMed

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  13. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    PubMed Central

    2011-01-01

    Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388

  14. Interrogating viral capsid assembly with ion mobility-mass spectrometry

    NASA Astrophysics Data System (ADS)

    Uetrecht, Charlotte; Barbu, Ioana M.; Shoemaker, Glen K.; van Duijn, Esther; Heck, Albert J. R.

    2011-02-01

    Most proteins fulfil their function as part of large protein complexes. Surprisingly, little is known about the pathways and regulation of protein assembly. Several viral coat proteins can spontaneously assemble into capsids in vitro with morphologies identical to the native virion and thus resemble ideal model systems for studying protein complex formation. Even for these systems, the mechanism for self-assembly is still poorly understood, although it is generally thought that smaller oligomeric structures form key intermediates. This assembly nucleus and larger viral assembly intermediates are typically low abundant and difficult to monitor. Here, we characterised small oligomers of Hepatitis B virus (HBV) and norovirus under equilibrium conditions using native ion mobility mass spectrometry. This data in conjunction with computational modelling enabled us to elucidate structural features of these oligomers. Instead of more globular shapes, the intermediates exhibit sheet-like structures suggesting that they are assembly competent. We propose pathways for the formation of both capsids.

  15. The flavivirus capsid protein: Structure, function and perspectives towards drug design.

    PubMed

    Oliveira, Edson R A; Mohana-Borges, Ronaldo; de Alencastro, Ricardo B; Horta, Bruno A C

    2017-01-02

    Flaviviruses, such as dengue and zika viruses, are etiologic agents transmitted to humans mainly by arthropods and are of great epidemiological interest. The flavivirus capsid protein is a structural element required for the viral nucleocapsid assembly that presents the classical function of sheltering the viral genome. After decades of research, many reports have shown its different functionalities and influence over cell normal functioning. The subcellular distribution of this protein, which involves accumulation around lipid droplets and nuclear localization, also corroborates with its multi-functional characteristic. As flavivirus diseases are still in need of global control and in view of the possible key functionalities that the capsid protein promotes over flavivirus biology, novel considerations arise towards anti-flavivirus drug research. This review covers the main aspects concerning structural and functional features of the flavivirus C protein, ultimately, highlighting prospects in drug discovery based on this viral target. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures

    PubMed Central

    Kister, Alexander

    2015-01-01

    We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198

  17. Structural Analysis of PTM Hotspots (SAPH-ire) – A Quantitative Informatics Method Enabling the Discovery of Novel Regulatory Elements in Protein Families*

    PubMed Central

    Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.

    2015-01-01

    Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665

  18. What amyloidoses may tell us about normal protein folding: The Alzheimer's disease story

    NASA Astrophysics Data System (ADS)

    Teplow, David B.

    2002-03-01

    Alzheimer's disease (AD) is a progressive, neurodegenerative disorder characterized by severe neuronal injury and death. A prominent histopathologic feature of AD is disseminated parenchymal and vascular amyloid deposition. The fibrils in these deposits are composed of the amyloid β-protein (Aβ), a peptide of 4 kDa mass. In vitro and in vivo studies of Aβ fibril formation have shown that both oligomeric and polymeric Aβ assemblies have neurotoxic activity. Understanding how these assemblies form thus could be of direct therapeutic relevance. However, the aggregation and fibril-forming propensities of Aβ have complicated structure determination. Nevertheless, careful morphologic, spectroscopic, protein chemical, and physiologic analyses of the time-dependent changes in Aβ conformation, assembly state, and biological activity which occur during fibrillogenesis have significantly advanced our understanding of this clinically important process. Here, I will discuss recent findings about the pathway(s) of Aβ folding and assembly and about key structural features of Aβ which control the associated kinetics. Interestingly, the amyloidogenic folding pathway of Aβ is in some respects the mirror image of that through which natively folded amyloidogenic proteins proceed.

  19. Structural and sequence features of two residue turns in beta-hairpins.

    PubMed

    Madan, Bharat; Seo, Sung Yong; Lee, Sun-Gu

    2014-09-01

    Beta-turns in beta-hairpins have been implicated as important sites in protein folding. In particular, two residue β-turns, the most abundant connecting elements in beta-hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta-hairpins with a large data set. For this, 3977 beta-turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta-hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two-residue turns, and their distributions were examined. From this study, we could identify code-like sequence motifs for the two residue beta-turn types. Finally, structural and sequence properties of beta-strands in the beta-hairpins were analyzed, which revealed that the beta-strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta-hairpin turn structures and sequences. © 2014 Wiley Periodicals, Inc.

  20. Smooth muscle membrane organization in the normal and dysfunctional human urinary bladder: a structural analysis.

    PubMed

    Burkhard, Fiona C; Monastyrskaya, Katia; Studer, Urs E; Draeger, Annette

    2005-01-01

    The decline in contractile properties is a characteristic feature of the dysfunctional bladder as a result of infravesical outlet obstruction. During clinical progression of the disease, smooth muscle cells undergo structural modifications. Since adaptations to constant changes in length require a high degree of structural organization within the sarcolemma, we have investigated the expression of several proteins, which are involved in smooth muscle membrane organization, in specimens derived from normal and dysfunctional organs. Specimen from patients with urodynamically normal/equivocal (n = 4), obstructed (n = 2), and acontractile (n = 2) bladders were analyzed relative to their structural features and sarcolemmal protein profile. Smooth muscle cells within the normal urinary bladder display a distinct sarcolemmal domain structure, characterized by firm actin-attachment sites, alternating with flexible "hinge" regions. In obstructed bladders, foci of cells displaying degenerative sarcolemmal changes alternate with areas of hypertrophic cells in which the membrane appears unaffected. In acontractile organs, the overall membrane structure remains intact, however annexin 6, a protein belonging to a family of Ca2+-dependent, "membrane-organizers," is downregulated. Degenerative changes in smooth muscle cells, which are chronically working against high resistance, are preferentially located within the actin-attachment sites. In acontractile bladders, the downregulation of annexin 6 might have a bearing on the fine-tuning of the plasma membrane during contraction/relaxation cycles. Copyright 2005 Wiley-Liss, Inc.

  1. Structural and evolutionary adaptation of rhoptry kinases and pseudokinases, a family of coccidian virulence factors

    PubMed Central

    2013-01-01

    Background The widespread protozoan parasite Toxoplasma gondii interferes with host cell functions by exporting the contents of a unique apical organelle, the rhoptry. Among the mix of secreted proteins are an expanded, lineage-specific family of protein kinases termed rhoptry kinases (ROPKs), several of which have been shown to be key virulence factors, including the pseudokinase ROP5. The extent and details of the diversification of this protein family are poorly understood. Results In this study, we comprehensively catalogued the ROPK family in the genomes of Toxoplasma gondii, Neospora caninum and Eimeria tenella, as well as portions of the unfinished genome of Sarcocystis neurona, and classified the identified genes into 42 distinct subfamilies. We systematically compared the rhoptry kinase protein sequences and structures to each other and to the broader superfamily of eukaryotic protein kinases to study the patterns of diversification and neofunctionalization in the ROPK family and its subfamilies. We identified three ROPK sub-clades of particular interest: those bearing a structurally conserved N-terminal extension to the kinase domain (NTE), an E. tenella-specific expansion, and a basal cluster including ROP35 and BPK1 that we term ROPKL. Structural analysis in light of the solved structures ROP2, ROP5, ROP8 and in comparison to typical eukaryotic protein kinases revealed ROPK-specific conservation patterns in two key regions of the kinase domain, surrounding a ROPK-conserved insert in the kinase hinge region and a disulfide bridge in the kinase substrate-binding lobe. We also examined conservation patterns specific to the NTE-bearing clade. We discuss the possible functional consequences of each. Conclusions Our work sheds light on several important but previously unrecognized features shared among rhoptry kinases, as well as the essential differences between active and degenerate protein kinases. We identify the most distinctive ROPK-specific features conserved across both active kinases and pseudokinases, and discuss these in terms of sequence motifs, evolutionary context, structural impact and potential functional relevance. By characterizing the proteins that enable these parasites to invade the host cell and co-opt its signaling mechanisms, we provide guidance on potential therapeutic targets for the diseases caused by coccidian parasites. PMID:23742205

  2. Ab initio folding of proteins using all-atom discrete molecular dynamics

    PubMed Central

    Ding, Feng; Tsao, Douglas; Nie, Huifen; Dokholyan, Nikolay V.

    2008-01-01

    Summary Discrete molecular dynamics (DMD) is a rapid sampling method used in protein folding and aggregation studies. Until now, DMD was used to perform simulations of simplified protein models in conjunction with structure-based force fields. Here, we develop an all-atom protein model and a transferable force field featuring packing, solvation, and environment-dependent hydrogen bond interactions. Using the replica exchange method, we perform folding simulations of six small proteins (20–60 residues) with distinct native structures. In all cases, native or near-native states are reached in simulations. For three small proteins, multiple folding transitions are observed and the computationally-characterized thermodynamics are in quantitative agreement with experiments. The predictive power of all-atom DMD highlights the importance of environment-dependent hydrogen bond interactions in modeling protein folding. The developed approach can be used for accurate and rapid sampling of conformational spaces of proteins and protein-protein complexes, and applied to protein engineering and design of protein-protein interactions. PMID:18611374

  3. The new protein topology graph library web server.

    PubMed

    Schäfer, Tim; Scheck, Andreas; Bruneß, Daniel; May, Patrick; Koch, Ina

    2016-02-01

    We present a new, extended version of the Protein Topology Graph Library web server. The Protein Topology Graph Library describes the protein topology on the super-secondary structure level. It allows to compute and visualize protein ligand graphs and search for protein structural motifs. The new server features additional information on ligand binding to secondary structure elements, increased usability and an application programming interface (API) to retrieve data, allowing for an automated analysis of protein topology. The Protein Topology Graph Library server is freely available on the web at http://ptgl.uni-frankfurt.de. The website is implemented in PHP, JavaScript, PostgreSQL and Apache. It is supported by all major browsers. The VPLG software that was used to compute the protein ligand graphs and all other data in the database is available under the GNU public license 2.0 from http://vplg.sourceforge.net. tim.schaefer@bioinformatik.uni-frankfurt.de; ina.koch@bioinformatik.uni-frankfurt.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Internally bridging water molecule in transmembrane alpha-helical kink.

    PubMed

    Miyano, Masashi; Ago, Hideo; Saino, Hiromichi; Hori, Tetsuya; Ida, Koh

    2010-08-01

    There are hundreds of membrane protein atomic coordinates in the Protein Data Bank (PDB), and high-resolution structures of better than 2.5 A enable the visualization of a sizable number of amphiphiles (lipid and/or detergent) and bound water molecules as essential parts of the structure. Upon scrutinizing these high-resolution structures, water molecules were found to 'wedge' and stabilize large kink angle (30-40 degrees) in a simple cylindrical model at the transmembrane helical kinks so as to form an inter-helical cavity to accommodate a ligand binding or active site as a crucial structural feature in alpha-helical integral membrane proteins. Furthermore, some of these water molecules are proposed to play a pivotal role of their conformational change to exert their functional regulation. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  5. Mechanism of Resilin Elasticity

    PubMed Central

    Qin, Guokui; Hu, Xiao; Cebe, Peggy; Kaplan, David L.

    2012-01-01

    Resilin is critical in the flight and jumping systems of insects as a polymeric rubber-like protein with outstanding elasticity. However, insight into the underlying molecular mechanisms responsible for resilin elasticity remains undefined. Here we report the structure and function of resilin from Drosophila CG15920. A reversible beta-turn transition was identified in the peptide encoded by exon III and for full length resilin during energy input and release, features that correlate to the rapid deformation of resilin during functions in vivo. Micellar structures and nano-porous patterns formed after beta-turn structures were present via changes in either the thermal or mechanical inputs. A model is proposed to explain the super elasticity and energy conversion mechanisms of resilin, providing important insight into structure-function relationships for this protein. Further, this model offers a view of elastomeric proteins in general where beta-turn related structures serve as fundamental units of the structure and elasticity. PMID:22893127

  6. Aminotryptophan-containing barstar: structure--function tradeoff in protein design and engineering with an expanded genetic code.

    PubMed

    Rubini, Marina; Lepthien, Sandra; Golbik, Ralph; Budisa, Nediljko

    2006-07-01

    The indole ring of the canonical amino acid tryptophan (Trp) possesses distinguished features, such as sterical bulk, hydrophobicity and the nitrogen atom which is capable of acting as a hydrogen bond donor. The introduction of an amino group into the indole moiety of Trp yields the structural analogs 4-aminotryptophan ((4-NH(2))Trp) and 5-aminotryptophan ((5-NH(2))Trp). Their hydrophobicity and spectral properties are substantially different when compared to those of Trp. They resemble the purine bases of DNA and share their capacity for pH-sensitive intramolecular charge transfer. The Trp --> aminotryptophan substitution in proteins during ribosomal translation is expected to result in related protein variants that acquire these features. These expectations have been fulfilled by incorporating (4-NH(2))Trp and (5-NH(2))Trp into barstar, an intracellular inhibitor of the ribonuclease barnase from Bacillus amyloliquefaciens. The crystal structure of (4-NH(2))Trp-barstar is similar to that of the parent protein, whereas its spectral and thermodynamic behavior is found to be remarkably different. The T(m) value of (4-NH(2))Trp- and (5-NH(2))Trp-barstar is lowered by about 20 degrees Celsius, and they exhibit a strongly reduced unfolding cooperativity and substantial loss of free energy in folding. Furthermore, folding kinetic study of (4-NH(2))Trp-barstar revealed that the denatured state is even preferred over native one. The combination of structural and thermodynamic analyses clearly shows how structures of substituted barstar display a typical structure-function tradeoff: the acquirement of unique pH-sensitive charge transfer as a novel function is achieved at the expense of protein stability. These findings provide a new insight into the evolution of the amino acid repertoire of the universal genetic code and highlight possible problems regarding protein engineering and design by using an expanded genetic code.

  7. LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.

    PubMed

    Ryan, Michael; Diekhans, Mark; Lien, Stephanie; Liu, Yun; Karchin, Rachel

    2009-06-01

    LS-SNP/PDB is a new WWW resource for genome-wide annotation of human non-synonymous (amino acid changing) SNPs. It serves high-quality protein graphics rendered with UCSF Chimera molecular visualization software. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features. LS-SNP/PDB is available at (http://ls-snp.icm.jhu.edu/ls-snp-pdb) and via links from protein data bank (PDB) biology and chemistry tabs, UCSC Genome Browser Gene Details and SNP Details pages and PharmGKB Gene Variants Downloads/Cross-References pages.

  8. Tertiary structure of human {Lambda}6 light chains.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pokkuluri, P. R.; Solomon, A.; Weiss, D. T.

    1999-01-01

    AL amyloidosis is a disease process characterized by the pathologic deposition of monoclonal light chains in tissue. To date, only limited information has been obtained on the molecular features that render such light chains amyloidogenic. Although protein products of the major human V kappa and V lambda gene families have been identified in AL deposits, one particular subgroup--lambda 6--has been found to be preferentially associated with this disease. Notably, the variable region of lambda 6 proteins (V lambda 6) has distinctive primary structural features including the presence in the third framework region (FR3) of two additional amino acid residues thatmore » distinguish members of this subgroup from other types of light chains. However, the structural consequences of these alterations have not been elucidated. To determine if lambda 6 proteins possess unique tertiary structural features, as compared to light chains of other V lambda subgroups, we have obtained x-ray diffraction data on crystals prepared from two recombinant V lambda 6 molecules. These components, isolated from a bacterial expression system, were generated from lambda 6-related cDNAs cloned from bone marrow-derived plasma cells from a patient (Wil) who had documented AL amyloidosis and another (Jto) with multiple myeloma and tubular cast nephropathy, but no evident fibrillar deposits. The x-ray crystallographic analyses revealed that the two-residue insertion located between positions 68 and 69 (not between 66 and 67 as previously surmised) extended an existing loop region that effectively increased the surface area adjacent to the first complementarity determining region (CDR1). Further, an unusual interaction between the Arg 25 and Phe 2 residues commonly found in lambda 6 molecules was noted. However, the structures of V lambda 6 Wil and Jto also differed from each other, as evidenced by the presence in the latter of certain ionic and hydrophobic interactions that we posit increased protein stability and thus prevented amyloid formation.« less

  9. Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.

    PubMed

    Zhang, Jian; Gao, Bo; Chai, Haiting; Ma, Zhiqiang; Yang, Guifu

    2016-08-26

    DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable. In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems. A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.

  10. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  11. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  12. Cas9 versus Cas12a/Cpf1: Structure-function comparisons and implications for genome editing.

    PubMed

    Swarts, Daan C; Jinek, Martin

    2018-05-22

    Cas9 and Cas12a are multidomain CRISPR-associated nucleases that can be programmed with a guide RNA to bind and cleave complementary DNA targets. The guide RNA sequence can be varied, making these effector enzymes versatile tools for genome editing and gene regulation applications. While Cas9 is currently the best-characterized and most widely used nuclease for such purposes, Cas12a (previously named Cpf1) has recently emerged as an alternative for Cas9. Cas9 and Cas12a have distinct evolutionary origins and exhibit different structural architectures, resulting in distinct molecular mechanisms. Here we compare the structural and mechanistic features that distinguish Cas9 and Cas12a, and describe how these features modulate their activity. We discuss implications for genome editing, and how they may influence the choice of Cas9 or Cas12a for specific applications. Finally, we review recent studies in which Cas12a has been utilized as a genome editing tool. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Biogenesis of Effector Small RNAs RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes. © 2018 Wiley Periodicals, Inc.

  13. Solution Binding and Structural Analyses Reveal Potential Multidrug Resistance Functions for SAV2435 and CTR107 and Other GyrI-like Proteins.

    PubMed

    Moreno, Andrew; Froehlig, John R; Bachas, Sharrol; Gunio, Drew; Alexander, Teressa; Vanya, Aaron; Wade, Herschel

    2016-08-30

    Multidrug resistance (MDR) refers to the acquired ability of cells to tolerate a broad range of toxic compounds. One mechanism cells employ is to increase the level of expression of efflux pumps for the expulsion of xenobiotics. A key feature uniting efflux-related mechanisms is multidrug (MD) recognition, either by efflux pumps themselves or by their transcriptional regulators. However, models describing MD binding by MDR effectors are incomplete, underscoring the importance of studies focused on the recognition elements and key motifs that dictate polyspecific binding. One such motif is the GyrI-like domain, which is found in several MDR proteins and is postulated to have been adapted for small-molecule binding and signaling. Here we report the solution binding properties and crystal structures of two proteins containing GyrI-like domains, SAV2435 and CTR107, bound to various ligands. Furthermore, we provide a comparison with deposited crystal structures of GyrI-like proteins, revealing key features of GyrI-like domains that not only support polyspecific binding but also are conserved among GyrI-like domains. Together, our studies suggest that GyrI-like domains perform evolutionarily conserved functions connected to multidrug binding and highlight the utility of these types of studies for elucidating mechanisms of MDR.

  14. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    PubMed

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. © 2013 The Authors. Published by Elsevier Ltd All rights reserved.

  15. Are Charge-State Distributions a Reliable Tool Describing Molecular Ensembles of Intrinsically Disordered Proteins by Native MS?

    NASA Astrophysics Data System (ADS)

    Natalello, Antonino; Santambrogio, Carlo; Grandori, Rita

    2017-01-01

    Native mass spectrometry (MS) has become a central tool of structural proteomics, but its applicability to the peculiar class of intrinsically disordered proteins (IDPs) is still object of debate. IDPs lack an ordered tridimensional structure and are characterized by high conformational plasticity. Since they represent valuable targets for cancer and neurodegeneration research, there is an urgent need of methodological advances for description of the conformational ensembles populated by these proteins in solution. However, structural rearrangements during electrospray-ionization (ESI) or after the transfer to the gas phase could affect data obtained by native ESI-MS. In particular, charge-state distributions (CSDs) are affected by protein conformation inside ESI droplets, while ion mobility (IM) reflects protein conformation in the gas phase. This review focuses on the available evidence relating IDP solution ensembles with CSDs, trying to summarize cases of apparent consistency or discrepancy. The protein-specificity of ionization patterns and their responses to ligands and buffer conditions suggests that CSDs are imprinted to protein structural features also in the case of IDPs. Nevertheless, it seems that these proteins are more easily affected by electrospray conditions, leading in some cases to rearrangements of the conformational ensembles.

  16. Are Charge-State Distributions a Reliable Tool Describing Molecular Ensembles of Intrinsically Disordered Proteins by Native MS?

    PubMed

    Natalello, Antonino; Santambrogio, Carlo; Grandori, Rita

    2017-01-01

    Native mass spectrometry (MS) has become a central tool of structural proteomics, but its applicability to the peculiar class of intrinsically disordered proteins (IDPs) is still object of debate. IDPs lack an ordered tridimensional structure and are characterized by high conformational plasticity. Since they represent valuable targets for cancer and neurodegeneration research, there is an urgent need of methodological advances for description of the conformational ensembles populated by these proteins in solution. However, structural rearrangements during electrospray-ionization (ESI) or after the transfer to the gas phase could affect data obtained by native ESI-MS. In particular, charge-state distributions (CSDs) are affected by protein conformation inside ESI droplets, while ion mobility (IM) reflects protein conformation in the gas phase. This review focuses on the available evidence relating IDP solution ensembles with CSDs, trying to summarize cases of apparent consistency or discrepancy. The protein-specificity of ionization patterns and their responses to ligands and buffer conditions suggests that CSDs are imprinted to protein structural features also in the case of IDPs. Nevertheless, it seems that these proteins are more easily affected by electrospray conditions, leading in some cases to rearrangements of the conformational ensembles. Graphical Abstract ᅟ.

  17. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  18. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  19. In silico modeling of the yeast protein and protein family interaction network

    NASA Astrophysics Data System (ADS)

    Goh, K.-I.; Kahng, B.; Kim, D.

    2004-03-01

    Understanding of how protein interaction networks of living organisms have evolved or are organized can be the first stepping stone in unveiling how life works on a fundamental ground. Here we introduce an in silico ``coevolutionary'' model for the protein interaction network and the protein family network. The essential ingredient of the model includes the protein family identity and its robustness under evolution, as well as the three previously proposed: gene duplication, divergence, and mutation. This model produces a prototypical feature of complex networks in a wide range of parameter space, following the generalized Pareto distribution in connectivity. Moreover, we investigate other structural properties of our model in detail with some specific values of parameters relevant to the yeast Saccharomyces cerevisiae, showing excellent agreement with the empirical data. Our model indicates that the physical constraints encoded via the domain structure of proteins play a crucial role in protein interactions.

  20. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  1. A carrot leucine-rich-repeat protein that inhibits ice recrystallization.

    PubMed

    Worrall, D; Elias, L; Ashford, D; Smallwood, M; Sidebottom, C; Lillford, P; Telford, J; Holt, C; Bowles, D

    1998-10-02

    Many organisms adapted to live at subzero temperatures express antifreeze proteins that improve their tolerance to freezing. Although structurally diverse, all antifreeze proteins interact with ice surfaces, depress the freezing temperature of aqueous solutions, and inhibit ice crystal growth. A protein purified from carrot shares these functional features with antifreeze proteins of fish. Expression of the carrot complementary DNA in tobacco resulted in the accumulation of antifreeze activity in the apoplast of plants grown at greenhouse temperatures. The sequence of carrot antifreeze protein is similar to that of polygalacturonase inhibitor proteins and contains leucine-rich repeats.

  2. Electrostatic Similarities between Protein and Small Molecule Ligands Facilitate the Design of Protein-Protein Interaction Inhibitors

    PubMed Central

    Zhang, Kam Y. J.

    2013-01-01

    One of the underlying principles in drug discovery is that a biologically active compound is complimentary in shape and molecular recognition features to its receptor. This principle infers that molecules binding to the same receptor may share some common features. Here, we have investigated whether the electrostatic similarity can be used for the discovery of small molecule protein-protein interaction inhibitors (SMPPIIs). We have developed a method that can be used to evaluate the similarity of electrostatic potentials between small molecules and known protein ligands. This method was implemented in a software called EleKit. Analyses of all available (at the time of research) SMPPII structures indicate that SMPPIIs bear some similarities of electrostatic potential with the ligand proteins of the same receptor. This is especially true for the more polar SMPPIIs. Retrospective analysis of several successful SMPPIIs has shown the applicability of EleKit in the design of new SMPPIIs. PMID:24130741

  3. Crystal Structure of the GRAS Domain of SCARECROW-LIKE7 in Oryza sativa

    PubMed Central

    Li, Shengping; Zhao, Yanhe; Zhao, Zheng; Wu, Xiuling; Sun, Lifang; Liu, Qingsong; Wu, Yunkun

    2016-01-01

    GRAS proteins belong to a plant-specific protein family with many members and play essential roles in plant growth and development, functioning primarily in transcriptional regulation. Proteins in the family are minimally defined as containing the conserved GRAS domain. Here, we determined the structure of the GRAS domain of Os-SCL7 from rice (Oryza sativa) to 1.82 Å. The structure includes cap and core subdomains and elucidates the features of the conserved GRAS LRI, VHIID, LRII, PFYRE, and SAW motifs. The structure is a dimer, with a clear groove to accommodate double-stranded DNA. Docking a DNA segment into the groove to generate an Os-SCL7/DNA complex provides insight into the DNA binding mechanism of GRAS proteins. Furthermore, the in vitro DNA binding property of Os-SCL7 and model-defined recognition residues are assessed by electrophoretic mobility shift analysis and mutagenesis assays. These studies reveal the structure and preliminary DNA interaction mechanisms of GRAS proteins and open the door to in-depth investigation and understanding of the individual pathways in which they play important roles. PMID:27081181

  4. Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.

    PubMed

    Carugo, Oliviero; Djinović-Carugo, Kristina

    2016-01-01

    It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.

  5. Crystal Structure of Cockroach Allergen Bla g 2, an Unusual Zinc Binding Aspartic Protease with a Novel Mode of Self-inhibition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gustchina, Alla; Li, Mi; Wunschmann, Sabina

    2010-07-19

    The crystal structure of Bla g 2 was solved in order to investigate the structural basis for the allergenic properties of this unusual protein. This is the first structure of an aspartic protease in which conserved glycine residues, in two canonical DTG triads, are substituted by different amino acid residues. Another unprecedented feature revealed by the structure is the single phenylalanine residue insertion on the tip of the flap, with the side-chain occupying the S1 binding pocket. This and other important amino acid substitutions in the active site region of Bla g 2 modify the interactions in the vicinity ofmore » the catalytic aspartate residues, increasing the distance between them to {approx}4 {angstrom} and establishing unique direct contacts between the flap and the catalytic residues. We attribute the absence of substantial catalytic activity in Bla g 2 to these unusual features of the active site. Five disulfide bridges and a Zn-binding site confer stability to the protein, which may contribute to sensitization at lower levels of exposure than other allergens.« less

  6. 2.4 Å resolution crystal structure of human TRAP1 NM , the Hsp90 paralog in the mitochondrial matrix

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sung, Nuri; Lee, Jungsoon; Kim, Ji-Hyun

    2016-07-13

    TRAP1 is an organelle-specific Hsp90 paralog that is essential for neoplastic growth. As a member of the Hsp90 family, TRAP1 is presumed to be a general chaperone facilitating the late-stage folding of Hsp90 client proteins in the mitochondrial matrix. Interestingly, TRAP1 cannot replace cytosolic Hsp90 in protein folding, and none of the known Hsp90 co-chaperones are found in mitochondria. Thus, the three-dimensional structure of TRAP1 must feature regulatory elements that are essential to the ATPase activity and chaperone function of TRAP1. Here, the crystal structure of a human TRAP1 NMdimer is presented, featuring an intact N-domain and M-domain structure, boundmore » to adenosine 5'-β,γ-imidotriphosphate (ADPNP). The crystal structure together with epitope-mapping results shows that the TRAP1 M-domain loop 1 contacts the neighboring subunit and forms a previously unobserved third dimer interface that mediates the specific interaction with mitochondrial Hsp70.« less

  7. ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins.

    PubMed

    Konc, Janez; Janezic, Dusanka

    2012-07-01

    The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this article, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to pre-calculated protein similarity profiles for over 29 000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at http://probis.cmm.ki.si/old-version, and the new ProBiS web server is at http://probis.cmm.ki.si.

  8. ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins

    PubMed Central

    Konc, Janez; Janežič, Dušanka

    2012-01-01

    The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this article, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to pre-calculated protein similarity profiles for over 29 000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at http://probis.cmm.ki.si/old-version, and the new ProBiS web server is at http://probis.cmm.ki.si. PMID:22600737

  9. FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling.

    PubMed

    Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2016-07-01

    Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Using non-invasive molecular spectroscopic techniques to detect unique aspects of protein Amide functional groups and chemical properties of modeled forage from different sourced-origins

    NASA Astrophysics Data System (ADS)

    Ji, Cuiying; Zhang, Xuewei; Yu, Peiqiang

    2016-03-01

    The non-invasive molecular spectroscopic technique-FT/IR is capable to detect the molecular structure spectral features that are associated with biological, nutritional and biodegradation functions. However, to date, few researches have been conducted to use these non-invasive molecular spectroscopic techniques to study forage internal protein structures associated with biodegradation and biological functions. The objectives of this study were to detect unique aspects and association of protein Amide functional groups in terms of protein Amide I and II spectral profiles and chemical properties in the alfalfa forage (Medicago sativa L.) from different sourced-origins. In this study, alfalfa hay with two different origins was used as modeled forage for molecular structure and chemical property study. In each forage origin, five to seven sources were analyzed. The molecular spectral profiles were determined using FT/IR non-invasive molecular spectroscopy. The parameters of protein spectral profiles included functional groups of Amide I, Amide II and Amide I to II ratio. The results show that the modeled forage Amide I and Amide II were centered at 1653 cm- 1 and 1545 cm- 1, respectively. The Amide I spectral height and area intensities were from 0.02 to 0.03 and 2.67 to 3.36 AI, respectively. The Amide II spectral height and area intensities were from 0.01 to 0.02 and 0.71 to 0.93 AI, respectively. The Amide I to II spectral peak height and area ratios were from 1.86 to 1.88 and 3.68 to 3.79, respectively. Our results show that the non-invasive molecular spectroscopic techniques are capable to detect forage internal protein structure features which are associated with forage chemical properties.

  11. Teaching structure: student use of software tools for understanding macromolecular structure in an undergraduate biochemistry course.

    PubMed

    Jaswal, Sheila S; O'Hara, Patricia B; Williamson, Patrick L; Springer, Amy L

    2013-01-01

    Because understanding the structure of biological macromolecules is critical to understanding their function, students of biochemistry should become familiar not only with viewing, but also with generating and manipulating structural representations. We report a strategy from a one-semester undergraduate biochemistry course to integrate use of structural representation tools into both laboratory and homework activities. First, early in the course we introduce the use of readily available open-source software for visualizing protein structure, coincident with modules on amino acid and peptide bond properties. Second, we use these same software tools in lectures and incorporate images and other structure representations in homework tasks. Third, we require a capstone project in which teams of students examine a protein-nucleic acid complex and then use the software tools to illustrate for their classmates the salient features of the structure, relating how the structure helps explain biological function. To ensure engagement with a range of software and database features, we generated a detailed template file that can be used to explore any structure, and that guides students through specific applications of many of the software tools. In presentations, students demonstrate that they are successfully interpreting structural information, and using representations to illustrate particular points relevant to function. Thus, over the semester students integrate information about structural features of biological macromolecules into the larger discussion of the chemical basis of function. Together these assignments provide an accessible introduction to structural representation tools, allowing students to add these methods to their biochemical toolboxes early in their scientific development. © 2013 by The International Union of Biochemistry and Molecular Biology.

  12. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

    PubMed

    Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2012-05-10

    RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.

  13. Polymeric assembly of gluten proteins in an aqueous ethanol solvent.

    PubMed

    Dahesh, Mohsen; Banc, Amélie; Duri, Agnès; Morel, Marie-Hélène; Ramos, Laurence

    2014-09-25

    The supramolecular organization of wheat gluten proteins is largely unknown due to the intrinsic complexity of this family of proteins and their insolubility in water. We fractionate gluten in a water/ethanol mixture (50/50 v/v) and obtain a protein extract which is depleted in gliadin, the monomeric part of wheat gluten proteins, and enriched in glutenin, the polymeric part of wheat gluten proteins. We investigate the structure of the proteins in the solvent used for extraction over a wide range of concentration, by combining X-ray scattering and multiangle static and dynamic light scattering. Our data show that, in the ethanol/water mixture, the proteins display features characteristic of flexible polymer chains in a good solvent. In the dilute regime, the proteins form very loose structures of characteristic size 150 nm, with an internal dynamics which is quantitatively similar to that of branched polymer coils. In more concentrated regimes, data highlight a hierarchical structure with one characteristic length scale of the order of a few nm, which displays the scaling with concentration expected for a semidilute polymer in good solvent, and a fractal arrangement at a much larger length scale. This structure is strikingly similar to that of polymeric gels, thus providing some factual knowledge to rationalize the viscoelastic properties of wheat gluten proteins and their assemblies.

  14. Single helically folded aromatic oligoamides that mimic the charge surface of double-stranded B-DNA

    NASA Astrophysics Data System (ADS)

    Ziach, Krzysztof; Chollet, Céline; Parissi, Vincent; Prabhakaran, Panchami; Marchivie, Mathieu; Corvaglia, Valentina; Bose, Partha Pratim; Laxmi-Reddy, Katta; Godde, Frédéric; Schmitter, Jean-Marie; Chaignepain, Stéphane; Pourquier, Philippe; Huc, Ivan

    2018-05-01

    Numerous essential biomolecular processes require the recognition of DNA surface features by proteins. Molecules mimicking these features could potentially act as decoys and interfere with pharmacologically or therapeutically relevant protein-DNA interactions. Although naturally occurring DNA-mimicking proteins have been described, synthetic tunable molecules that mimic the charge surface of double-stranded DNA are not known. Here, we report the design, synthesis and structural characterization of aromatic oligoamides that fold into single helical conformations and display a double helical array of negatively charged residues in positions that match the phosphate moieties in B-DNA. These molecules were able to inhibit several enzymes possessing non-sequence-selective DNA-binding properties, including topoisomerase 1 and HIV-1 integrase, presumably through specific foldamer-protein interactions, whereas sequence-selective enzymes were not inhibited. Such modular and synthetically accessible DNA mimics provide a versatile platform to design novel inhibitors of protein-DNA interactions.

  15. Structural Prediction and In Silico Physicochemical Characterization for Mouse Caltrin I and Bovine Caltrin Proteins

    PubMed Central

    Grasso, Ernesto J.; Sottile, Adolfo E.; Coronel, Carlos E.

    2016-01-01

    It is known that caltrin (calcium transport inhibitor) protein binds to sperm cells during ejaculation and inhibits extracellular Ca2+ uptake. Although the sequence and some biological features of mouse caltrin I and bovine caltrin are known, their physicochemical properties and tertiary structure are mainly unknown. We predicted the 3D structures of mouse caltrin I and bovine caltrin by molecular homology modeling and threading. Surface electrostatic potentials and electric fields were calculated using the Poisson–Boltzmann equation. Several different bioinformatics tools and available web servers were used to thoroughly analyze the physicochemical characteristics of both proteins, such as their Kyte and Doolittle hydropathy scores and helical wheel projections. The results presented in this work significantly aid further understanding of the molecular mechanisms of caltrin proteins modulating physiological processes associated with fertilization. PMID:27812283

  16. Evolution of the arginase fold and functional diversity

    PubMed Central

    Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.

    2009-01-01

    The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740

  17. Gi- and Gs-coupled GPCRs show different modes of G-protein binding.

    PubMed

    Van Eps, Ned; Altenbach, Christian; Caro, Lydia N; Latorraca, Naomi R; Hollingsworth, Scott A; Dror, Ron O; Ernst, Oliver P; Hubbell, Wayne L

    2018-03-06

    More than two decades ago, the activation mechanism for the membrane-bound photoreceptor and prototypical G protein-coupled receptor (GPCR) rhodopsin was uncovered. Upon light-induced changes in ligand-receptor interaction, movement of specific transmembrane helices within the receptor opens a crevice at the cytoplasmic surface, allowing for coupling of heterotrimeric guanine nucleotide-binding proteins (G proteins). The general features of this activation mechanism are conserved across the GPCR superfamily. Nevertheless, GPCRs have selectivity for distinct G-protein family members, but the mechanism of selectivity remains elusive. Structures of GPCRs in complex with the stimulatory G protein, G s , and an accessory nanobody to stabilize the complex have been reported, providing information on the intermolecular interactions. However, to reveal the structural selectivity filters, it will be necessary to determine GPCR-G protein structures involving other G-protein subtypes. In addition, it is important to obtain structures in the absence of a nanobody that may influence the structure. Here, we present a model for a rhodopsin-G protein complex derived from intermolecular distance constraints between the activated receptor and the inhibitory G protein, G i , using electron paramagnetic resonance spectroscopy and spin-labeling methodologies. Molecular dynamics simulations demonstrated the overall stability of the modeled complex. In the rhodopsin-G i complex, G i engages rhodopsin in a manner distinct from previous GPCR-G s structures, providing insight into specificity determinants. Copyright © 2018 the Author(s). Published by PNAS.

  18. Crystal Structure of Menin Reveals Binding Site for Mixed Lineage Leukemia (MLL) Protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murai, Marcelo J.; Chruszcz, Maksymilian; Reddy, Gireesh

    2014-10-02

    Menin is a tumor suppressor protein that is encoded by the MEN1 (multiple endocrine neoplasia 1) gene and controls cell growth in endocrine tissues. Importantly, menin also serves as a critical oncogenic cofactor of MLL (mixed lineage leukemia) fusion proteins in acute leukemias. Direct association of menin with MLL fusion proteins is required for MLL fusion protein-mediated leukemogenesis in vivo, and this interaction has been validated as a new potential therapeutic target for development of novel anti-leukemia agents. Here, we report the first crystal structure of menin homolog from Nematostella vectensis. Due to a very high sequence similarity, the Nematostellamore » menin is a close homolog of human menin, and these two proteins likely have very similar structures. Menin is predominantly an {alpha}-helical protein with the protein core comprising three tetratricopeptide motifs that are flanked by two {alpha}-helical bundles and covered by a {beta}-sheet motif. A very interesting feature of menin structure is the presence of a large central cavity that is highly conserved between Nematostella and human menin. By employing site-directed mutagenesis, we have demonstrated that this cavity constitutes the binding site for MLL. Our data provide a structural basis for understanding the role of menin as a tumor suppressor protein and as an oncogenic co-factor of MLL fusion proteins. It also provides essential structural information for development of inhibitors targeting the menin-MLL interaction as a novel therapeutic strategy in MLL-related leukemias.« less

  19. Nicked apomyoglobin: a noncovalent complex of two polypeptide fragments comprising the entire protein chain.

    PubMed

    Musi, Valeria; Spolaore, Barbara; Picotti, Paola; Zambonin, Marcello; De Filippis, Vincenzo; Fontana, Angelo

    2004-05-25

    Limited proteolysis of the 153-residue chain of horse apomyoglobin (apoMb) by thermolysin results in the selective cleavage of the peptide bond Pro88-Leu89. The N-terminal (residues 1-88) and C-terminal (residues 89-153) fragments of apoMb were isolated to homogeneity and their conformational and association properties investigated in detail. Far-UV circular dichroism (CD) measurements revealed that both fragments in isolation acquire a high content of helical secondary structure, while near-UV CD indicated the absence of tertiary structure. A 1:1 mixture of the fragments leads to a tight noncovalent protein complex (1-88/89-153, nicked apoMb), characterized by secondary and tertiary structures similar to those of intact apoMb. The apoMb complex binds heme in a nativelike manner, as given by CD measurements in the Soret region. Second-derivative absorption spectra in the 250-300 nm region provided evidence that the degree of exposure of Tyr residues in the nicked species is similar to that of the intact protein at neutral pH. Also, the microenvironment of Trp residues, located in positions 7 and 14 of the 153-residue chain of the protein, is similar in both protein species, as given by fluorescence emission data. Moreover, in analogy to intact apoMb, the nicked protein binds the hydrophobic dye 1-anilinonaphthalene-8-sulfonate (ANS). Taken together, our results indicate that the two proteolytic fragments 1-88 and 89-153 of apoMb adopt partly folded states characterized by sufficiently nativelike conformational features that promote their specific association and mutual stabilization into a nicked protein species much resembling in its structural features intact apoMb. It is suggested that the formation of a noncovalent complex upon fragment complementation can mimic the protein folding process of the entire protein chain, with the difference that the folding of the complementary fragments is an intermolecular process. In particular, this study emphasizes the importance of interactions between marginally stable elements of secondary structure in promoting the tertiary contacts of a native protein. Considering that apoMb has been extensively used as a paradigm in protein folding studies for the past few decades, the novel fragment complementing system of apoMb here described appears to be very useful for investigating the initial as well as late events in protein folding.

  20. Phasins, Multifaceted Polyhydroxyalkanoate Granule-Associated Proteins

    PubMed Central

    Mezzina, Mariela P.

    2016-01-01

    Phasins are the major polyhydroxyalkanoate (PHA) granule-associated proteins. They promote bacterial growth and PHA synthesis and affect the number, size, and distribution of the granules. These proteins can be classified in 4 families with distinctive characteristics. Low-resolution structural studies and in silico predictions were performed in order to elucidate the structure of different phasins. Most of these proteins share some common structural features, such as a preponderant α-helix composition, the presence of disordered regions that provide flexibility to the protein, and coiled-coil interacting regions that form oligomerization domains. Due to their amphiphilic nature, these proteins play an important structural function, forming an interphase between the hydrophobic content of PHA granules and the hydrophilic cytoplasm content. Phasins have been observed to affect both PHA accumulation and utilization. Apart from their role as granule structural proteins, phasins have a remarkable variety of additional functions. Different phasins have been determined to (i) activate PHA depolymerization, (ii) increase the expression and activity of PHA synthases, (iii) participate in PHA granule segregation, and (iv) have both in vivo and in vitro chaperone activities. These properties suggest that phasins might play an active role in PHA-related stress protection and fitness enhancement. Due to their granule binding capacity and structural flexibility, several biotechnological applications have been developed using different phasins, increasing the interest in the study of these remarkable proteins. PMID:27287326

  1. Phasins, Multifaceted Polyhydroxyalkanoate Granule-Associated Proteins.

    PubMed

    Mezzina, Mariela P; Pettinari, M Julia

    2016-09-01

    Phasins are the major polyhydroxyalkanoate (PHA) granule-associated proteins. They promote bacterial growth and PHA synthesis and affect the number, size, and distribution of the granules. These proteins can be classified in 4 families with distinctive characteristics. Low-resolution structural studies and in silico predictions were performed in order to elucidate the structure of different phasins. Most of these proteins share some common structural features, such as a preponderant α-helix composition, the presence of disordered regions that provide flexibility to the protein, and coiled-coil interacting regions that form oligomerization domains. Due to their amphiphilic nature, these proteins play an important structural function, forming an interphase between the hydrophobic content of PHA granules and the hydrophilic cytoplasm content. Phasins have been observed to affect both PHA accumulation and utilization. Apart from their role as granule structural proteins, phasins have a remarkable variety of additional functions. Different phasins have been determined to (i) activate PHA depolymerization, (ii) increase the expression and activity of PHA synthases, (iii) participate in PHA granule segregation, and (iv) have both in vivo and in vitro chaperone activities. These properties suggest that phasins might play an active role in PHA-related stress protection and fitness enhancement. Due to their granule binding capacity and structural flexibility, several biotechnological applications have been developed using different phasins, increasing the interest in the study of these remarkable proteins. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  2. Structural features of LC8-induced self-association of swallow.

    PubMed

    Kidane, Ariam I; Song, Yujuan; Nyarko, Afua; Hall, Justin; Hare, Michael; Löhr, Frank; Barbar, Elisar

    2013-09-03

    Cell functions depend on the collective activity of protein networks within which a few proteins, called hubs, participate in a large number of interactions. Dynein light chain LC8, first discovered as a subunit of the motor protein dynein, is considered to have a role broader than that of dynein, and its participation in diverse systems fits the description of a hub. Among its partners is Swallow with which LC8 is essential for proper localization of bicoid mRNA at the anterior cortex of Drosophila oocytes. Why LC8 is essential in this process is not clear, but emerging evidence suggests that LC8 functions by promoting self-association and/or structural organization of its diverse binding partners. This work addresses the energetics and structural features of LC8-induced Swallow self-association distant from LC8 binding. Mutational design based on a hypothetical helical wheel, intermonomer nuclear Overhauser effects assigned to residues expected at interface positions, and circular dichroism spectral characteristics indicate that the LC8-promoted dimer of Swallow is a coiled coil. Secondary chemical shifts and (15)N backbone relaxation identify the boundaries and distinguishing structural features of the coiled coil. Thermodynamic analysis of Swallow polypeptides designed to decouple self-association from LC8 binding reveals that the higher binding affinity of the engineered bivalent Swallow is of purely entropic origin and that the linker separating the coiled coil from the LC8 binding site remains disordered. We speculate that the LC8-promoted coiled coil is critical for bicoid mRNA localization because it favors structural organization of Swallow, which except for the central LC8-promoted coiled coil is primarily disordered.

  3. Structural Features of LC8-Induced Self Association of Swallow†

    PubMed Central

    Kidane, Ariam I.; Song, Yujuan; Nyarko, Afua; Hall, Justin; Hare, Michael; Löhr, Frank; Barbar, Elisar

    2013-01-01

    Cell function depends on the collective activity of protein networks within which a few proteins, called hubs, participate in a large number of interactions. Dynein light chain LC8, first discovered as a subunit of the motor protein dynein, is considered to have a role broader than dynein and its participation in diverse systems fits the description of a hub. Among its partners is Swallow with which LC8 is essential for proper localization of bicoid mRNA at the anterior cortex of Drosophila oocytes. Why LC8 is essential in this process is not clear, but emerging evidence suggests that LC8 functions by promoting self-association and/or structural organization of its diverse binding partners. This work addresses the mechanistic and structural features of LC8-induced Swallow self-association distant from LC8 binding. Mutational design based on a hypothetical helical wheel, inter-monomer NOEs assigned to residues expected at interface positions and circular dichroism spectral characteristics indicate that the LC8-promoted dimer of Swallow is a coiled-coil. Secondary chemical shifts and 15N backbone relaxation identify the boundaries and distinguishing structural features of the coiled-coil. Thermodynamic analysis of Swallow polypeptides designed to decouple self-association from LC8 binding reveals that the higher binding affinity of the engineered bivalent Swallow is of purely entropic origin and that the linker separating the coiled-coil from the LC8 binding site remains disordered. We speculate that the LC8-promoted coiled-coil is critical for bicoid mRNA localization because it could induce structural organization of Swallow, which except for the central LC8-promoted coiled-coil is primarily disordered. PMID:23914803

  4. Animal Mitochondrial DNA Replication

    PubMed Central

    Ciesielski, Grzegorz L.; Oliveira, Marcos T.; Kaguni, Laurie S.

    2016-01-01

    Recent advances in the field of mitochondrial DNA (mtDNA) replication highlight the diversity of both the mechanisms utilized and the structural and functional organization of the proteins at mtDNA replication fork, despite the simplicity of the animal mtDNA genome. DNA polymerase γ, mtDNA helicase and mitochondrial single-stranded DNA-binding protein- the key replisome proteins, have evolved distinct structural features and biochemical properties. These appear to be correlated with mtDNA genomic features in different metazoan taxa and with their modes of DNA replication, although a substantial integrative research is warranted to establish firmly these links. To date, several modes of mtDNA replication have been described for animals: rolling circle, theta, strand-displacement, and RITOLS/bootlace. Resolution of a continuing controversy relevant to mtDNA replication in mammals/vertebrates will have a direct impact on the mechanistic interpretation of mtDNA-related human diseases. Here we review these subjects, integrating earlier and recent data to provide a perspective on the major challenges for future research. PMID:27241933

  5. Amyloid β-sheet mimics that antagonize protein aggregation and reduce amyloid toxicity

    NASA Astrophysics Data System (ADS)

    Cheng, Pin-Nan; Liu, Cong; Zhao, Minglei; Eisenberg, David; Nowick, James S.

    2012-11-01

    The amyloid protein aggregation associated with diseases such as Alzheimer's, Parkinson's and type II diabetes (among many others) features a bewildering variety of β-sheet-rich structures in transition from native proteins to ordered oligomers and fibres. The variation in the amino-acid sequences of the β-structures presents a challenge to developing a model system of β-sheets for the study of various amyloid aggregates. Here, we introduce a family of robust β-sheet macrocycles that can serve as a platform to display a variety of heptapeptide sequences from different amyloid proteins. We have tailored these amyloid β-sheet mimics (ABSMs) to antagonize the aggregation of various amyloid proteins, thereby reducing the toxicity of amyloid aggregates. We describe the structures and inhibitory properties of ABSMs containing amyloidogenic peptides from the amyloid-β peptide associated with Alzheimer's disease, β2-microglobulin associated with dialysis-related amyloidosis, α-synuclein associated with Parkinson's disease, islet amyloid polypeptide associated with type II diabetes, human and yeast prion proteins, and Tau, which forms neurofibrillary tangles.

  6. Heat of supersaturation-limited amyloid burst directly monitored by isothermal titration calorimetry.

    PubMed

    Ikenoue, Tatsuya; Lee, Young-Ho; Kardos, József; Yagi, Hisashi; Ikegami, Takahisa; Naiki, Hironobu; Goto, Yuji

    2014-05-06

    Amyloid fibrils form in supersaturated solutions via a nucleation and growth mechanism. Although the structural features of amyloid fibrils have become increasingly clearer, knowledge on the thermodynamics of fibrillation is limited. Furthermore, protein aggregation is not a target of calorimetry, one of the most powerful approaches used to study proteins. Here, with β2-microglobulin, a protein responsible for dialysis-related amyloidosis, we show direct heat measurements of the formation of amyloid fibrils using isothermal titration calorimetry (ITC). The spontaneous fibrillation after a lag phase was accompanied by exothermic heat. The thermodynamic parameters of fibrillation obtained under various protein concentrations and temperatures were consistent with the main-chain dominated structural model of fibrils, in which overall packing was less than that of the native structures. We also characterized the thermodynamics of amorphous aggregation, enabling the comparison of protein folding, amyloid fibrillation, and amorphous aggregation. These results indicate that ITC will become a promising approach for clarifying comprehensively the thermodynamics of protein folding and misfolding.

  7. The design and characterization of protein based block polymers

    NASA Astrophysics Data System (ADS)

    Haghpanah, Jennifer Shorah

    Over the past decades, protein engineering has provided noteworthy advances in basic science as well as in medicine and industry. Protein engineers are currently focusing their efforts on developing elementary rules to design proteins with a specific structure and function. Proteins derived from natural sources have been used generate a plethora of materials with remarkable structural and functional properties. In the first chapter, we show how we can fabricate protein polymers comprised of two different self-assembling domains (SADs). From our studies, we discover that SADs in different orientations have a large impact on their overall microscopic and macroscopic features. In the second chapter, we explore the impact of cellulose (Tc) on the diblocks EC and CE. We discover that Tc is able to selectively impact the mechanical propertied of CE because CE has smaller particle sizes and more E domain exposed on its surface at RT. In the third chapter, we appended an extra C domain to CE to generate CEC with improved mechanical properties, structure and small molecule recognition.

  8. How protein materials balance strength, robustness, and adaptability

    PubMed Central

    Buehler, Markus J.; Yung, Yu Ching

    2010-01-01

    Proteins form the basis of a wide range of biological materials such as hair, skin, bone, spider silk, or cells, which play an important role in providing key functions to biological systems. The focus of this article is to discuss how protein materials are capable of balancing multiple, seemingly incompatible properties such as strength, robustness, and adaptability. To illustrate this, we review bottom-up materiomics studies focused on the mechanical behavior of protein materials at multiple scales, from nano to macro. We focus on alpha-helix based intermediate filament proteins as a model system to explain why the utilization of hierarchical structural features is vital to their ability to combine strength, robustness, and adaptability. Experimental studies demonstrating the activation of angiogenesis, the growth of new blood vessels, are presented as an example of how adaptability of structure in biological tissue is achieved through changes in gene expression that result in an altered material structure. We analyze the concepts in light of the universality and diversity of the structural makeup of protein materials and discuss the findings in the context of potential fundamental evolutionary principles that control their nanoscale structure. We conclude with a discussion of multiscale science in biology and de novo materials design. PMID:20676305

  9. Interrogation of Mammalian Protein Complex Structure, Function, and Membership Using Genome-Scale Fitness Screens.

    PubMed

    Pan, Joshua; Meyers, Robin M; Michel, Brittany C; Mashtalir, Nazar; Sizemore, Ann E; Wells, Jonathan N; Cassel, Seth H; Vazquez, Francisca; Weir, Barbara A; Hahn, William C; Marsh, Joseph A; Tsherniak, Aviad; Kadoch, Cigall

    2018-05-23

    Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity. From these measures, we systematically built and characterized functional similarity networks that recapitulate known structural and functional features of well-studied protein complexes and resolve novel functional modules within complexes lacking structural resolution, such as the mammalian SWI/SNF complex. Finally, by integrating functional networks with large protein-protein interaction networks, we discovered novel protein complexes involving recently evolved genes of unknown function. Taken together, these findings demonstrate the utility of genetic perturbation screens alone, and in combination with large-scale biophysical data, to enhance our understanding of mammalian protein complexes in normal and disease states. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Topology of membrane proteins-predictions, limitations and variations.

    PubMed

    Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

    2017-10-26

    Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Solid-phase synthesis and screening of N-acylated polyamine (NAPA) combinatorial libraries for protein binding.

    PubMed

    Iera, Jaclyn A; Jenkins, Lisa M Miller; Kajiyama, Hiroshi; Kopp, Jeffrey B; Appella, Daniel H

    2010-11-15

    Inhibitors for protein-protein interactions are challenging to design, in part due to the unique and complex architectures of each protein's interaction domain. Most approaches to develop inhibitors for these interactions rely on rational design, which requires prior structural knowledge of the target and its ligands. In the absence of structural information, a combinatorial approach may be the best alternative to finding inhibitors of a protein-protein interaction. Current chemical libraries, however, consist mostly of molecules designed to inhibit enzymes. In this manuscript, we report the synthesis and screening of a library based on an N-acylated polyamine (NAPA) scaffold that we designed to have specific molecular features necessary to inhibit protein-protein interactions. Screens of the library identified a member with favorable binding properties to the HIV viral protein R (Vpr), a regulatory protein from HIV, that is involved in numerous interactions with other proteins critical for viral replication. Published by Elsevier Ltd.

  12. Protein disulfide isomerase a multifunctional protein with multiple physiological roles

    NASA Astrophysics Data System (ADS)

    Ali Khan, Hyder; Mutus, Bulent

    2014-08-01

    Protein disulfide isomerase (PDI), is a member of the thioredoxin superfamily of redox proteins. PDI has three catalytic activities including, thiol-disulfide oxireductase, disulfide isomerase and redox-dependent chaperone. Originally, PDI was identified in the lumen of the endoplasmic reticulum and subsequently detected at additional locations, such as cell surfaces and the cytosol. This review will provide an overview of the recent advances in relating the structural features of PDI to its multiple catalytic roles as well as its physiological and pathophysiological functions related to redox regulation and protein folding.

  13. Structure and Function in Homodimeric Enzymes: Simulations of Cooperative and Independent Functional Motions.

    PubMed

    Wells, Stephen A; van der Kamp, Marc W; McGeagh, John D; Mulholland, Adrian J

    2015-01-01

    Large-scale conformational change is a common feature in the catalytic cycles of enzymes. Many enzymes function as homodimers with active sites that contain elements from both chains. Symmetric and anti-symmetric cooperative motions in homodimers can potentially lead to correlated active site opening and/or closure, likely to be important for ligand binding and release. Here, we examine such motions in two different domain-swapped homodimeric enzymes: the DcpS scavenger decapping enzyme and citrate synthase. We use and compare two types of all-atom simulations: conventional molecular dynamics simulations to identify physically meaningful conformational ensembles, and rapid geometric simulations of flexible motion, biased along normal mode directions, to identify relevant motions encoded in the protein structure. The results indicate that the opening/closure motions are intrinsic features of both unliganded enzymes. In DcpS, conformational change is dominated by an anti-symmetric cooperative motion, causing one active site to close as the other opens; however a symmetric motion is also significant. In CS, we identify that both symmetric (suggested by crystallography) and asymmetric motions are features of the protein structure, and as a result the behaviour in solution is largely non-cooperative. The agreement between two modelling approaches using very different levels of theory indicates that the behaviours are indeed intrinsic to the protein structures. Geometric simulations correctly identify and explore large amplitudes of motion, while molecular dynamics simulations indicate the ranges of motion that are energetically feasible. Together, the simulation approaches are able to reveal unexpected functionally relevant motions, and highlight differences between enzymes.

  14. Structure and Function in Homodimeric Enzymes: Simulations of Cooperative and Independent Functional Motions

    PubMed Central

    McGeagh, John D.; Mulholland, Adrian J.

    2015-01-01

    Large-scale conformational change is a common feature in the catalytic cycles of enzymes. Many enzymes function as homodimers with active sites that contain elements from both chains. Symmetric and anti-symmetric cooperative motions in homodimers can potentially lead to correlated active site opening and/or closure, likely to be important for ligand binding and release. Here, we examine such motions in two different domain-swapped homodimeric enzymes: the DcpS scavenger decapping enzyme and citrate synthase. We use and compare two types of all-atom simulations: conventional molecular dynamics simulations to identify physically meaningful conformational ensembles, and rapid geometric simulations of flexible motion, biased along normal mode directions, to identify relevant motions encoded in the protein structure. The results indicate that the opening/closure motions are intrinsic features of both unliganded enzymes. In DcpS, conformational change is dominated by an anti-symmetric cooperative motion, causing one active site to close as the other opens; however a symmetric motion is also significant. In CS, we identify that both symmetric (suggested by crystallography) and asymmetric motions are features of the protein structure, and as a result the behaviour in solution is largely non-cooperative. The agreement between two modelling approaches using very different levels of theory indicates that the behaviours are indeed intrinsic to the protein structures. Geometric simulations correctly identify and explore large amplitudes of motion, while molecular dynamics simulations indicate the ranges of motion that are energetically feasible. Together, the simulation approaches are able to reveal unexpected functionally relevant motions, and highlight differences between enzymes. PMID:26241964

  15. A feature-based approach to modeling protein–protein interaction hot spots

    PubMed Central

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-01-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533

  16. Structural insights into conserved L-arabinose metabolic enzymes reveal the substrate binding site of a thermophilic L-arabinose isomerase.

    PubMed

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Seong-Bo; Lee, Sang Jun; Lee, Sung Haeng; Lee, Dong-Woo

    2014-03-18

    Structural genomics demonstrates that despite low levels of structural similarity of proteins comprising a metabolic pathway, their substrate binding regions are likely to be conserved. Herein based on the 3D-structures of the α/β-fold proteins involved in the ara operon, we attempted to predict the substrate binding residues of thermophilic Geobacillus stearothermophilus L-arabinose isomerase (GSAI) with no 3D-structure available. Comparison of the structures of L-arabinose catabolic enzymes revealed a conserved feature to form the substrate-binding modules, which can be extended to predict the substrate binding site of GSAI (i.e., D195, E261 and E333). Moreover, these data implicated that proteins in the l-arabinose metabolic pathway might retain their substrate binding niches as the modular structure through conserved molecular evolution even with totally different structural scaffolds. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  17. ZifBASE: a database of zinc finger proteins and associated resources.

    PubMed

    Jayakanthan, Mannu; Muthukumaran, Jayaraman; Chandrasekar, Sanniyasi; Chawla, Konika; Punetha, Ankita; Sundar, Durai

    2009-09-09

    Information on the occurrence of zinc finger protein motifs in genomes is crucial to the developing field of molecular genome engineering. The knowledge of their target DNA-binding sequences is vital to develop chimeric proteins for targeted genome engineering and site-specific gene correction. There is a need to develop a computational resource of zinc finger proteins (ZFP) to identify the potential binding sites and its location, which reduce the time of in vivo task, and overcome the difficulties in selecting the specific type of zinc finger protein and the target site in the DNA sequence. ZifBASE provides an extensive collection of various natural and engineered ZFP. It uses standard names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, Protein Data Bank, ModBase, Protein Model Portal and the literature. It also incorporates specialized features of ZFP including finger sequences and positions, number of fingers, physiochemical properties, classes, framework, PubMed citations with links to experimental structures (PDB, if available) and modeled structures of natural zinc finger proteins. ZifBASE provides information on zinc finger proteins (both natural and engineered ones), the number of finger units in each of the zinc finger proteins (with multiple fingers), the synergy between the adjacent fingers and their positions. Additionally, it gives the individual finger sequence and their target DNA site to which it binds for better and clear understanding on the interactions of adjacent fingers. The current version of ZifBASE contains 139 entries of which 89 are engineered ZFPs, containing 3-7F totaling to 296 fingers. There are 50 natural zinc finger protein entries ranging from 2-13F, totaling to 307 fingers. It has sequences and structures from literature, Protein Data Bank, ModBase and Protein Model Portal. The interface is cross linked to other public databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point) of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.

  18. Conserved and variable domains of RNase MRP RNA.

    PubMed

    Dávila López, Marcela; Rosenblad, Magnus Alm; Samuelsson, Tore

    2009-01-01

    Ribonuclease MRP is a eukaryotic ribonucleoprotein complex consisting of one RNA molecule and 7-10 protein subunits. One important function of MRP is to catalyze an endonucleolytic cleavage during processing of rRNA precursors. RNase MRP is evolutionary related to RNase P which is critical for tRNA processing. A large number of MRP RNA sequences that now are available have been used to identify conserved primary and secondary structure features of the molecule. MRP RNA has structural features in common with P RNA such as a conserved catalytic core, but it also has unique features and is characterized by a domain highly variable between species. Information regarding primary and secondary structure features is of interest not only in basic studies of the function of MRP RNA, but also because mutations in the RNA give rise to human genetic diseases such as cartilage-hair hypoplasia.

  19. S-Layer Protein Self-Assembly

    PubMed Central

    Pum, Dietmar; Toca-Herrera, Jose Luis; Sleytr, Uwe B.

    2013-01-01

    Crystalline S(urface)-layers are the most commonly observed cell surface structures in prokaryotic organisms (bacteria and archaea). S-layers are highly porous protein meshworks with unit cell sizes in the range of 3 to 30 nm, and thicknesses of ~10 nm. One of the key features of S-layer proteins is their intrinsic capability to form self-assembled mono- or double layers in solution, and at interfaces. Basic research on S-layer proteins laid foundation to make use of the unique self-assembly properties of native and, in particular, genetically functionalized S-layer protein lattices, in a broad range of applications in the life and non-life sciences. This contribution briefly summarizes the knowledge about structure, genetics, chemistry, morphogenesis, and function of S-layer proteins and pays particular attention to the self-assembly in solution, and at differently functionalized solid supports. PMID:23354479

  20. Quantifying side-chain conformational variations in protein structure

    PubMed Central

    Miao, Zhichao; Cao, Yang

    2016-01-01

    Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs. PMID:27845406

  1. Quantifying side-chain conformational variations in protein structure

    NASA Astrophysics Data System (ADS)

    Miao, Zhichao; Cao, Yang

    2016-11-01

    Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.

  2. Quantifying side-chain conformational variations in protein structure.

    PubMed

    Miao, Zhichao; Cao, Yang

    2016-11-15

    Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.

  3. Discrete structural features among interface residue-level classes.

    PubMed

    Sowmya, Gopichandran; Ranganathan, Shoba

    2015-01-01

    Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding. Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes. Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs.

  4. Discrete structural features among interface residue-level classes

    PubMed Central

    2015-01-01

    Background Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding. Results Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes. Conclusions Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs. PMID:26679043

  5. Computational Insight into Protein Tyrosine Phosphatase 1B Inhibition: A Case Study of the Combined Ligand- and Structure-Based Approach.

    PubMed

    Zhang, Xiangyu; Jiang, Hailun; Li, Wei; Wang, Jian; Cheng, Maosheng

    2017-01-01

    Protein tyrosine phosphatase 1B (PTP1B) is an attractive target for treating cancer, obesity, and type 2 diabetes. In our work, the way of combined ligand- and structure-based approach was applied to analyze the characteristics of PTP1B enzyme and its interaction with competitive inhibitors. Firstly, the pharmacophore model of PTP1B inhibitors was built based on the common feature of sixteen compounds. It was found that the pharmacophore model consisted of five chemical features: one aromatic ring (R) region, two hydrophobic (H) groups, and two hydrogen bond acceptors (A). To further elucidate the binding modes of these inhibitors with PTP1B active sites, four docking programs (AutoDock 4.0, AutoDock Vina 1.0, standard precision (SP) Glide 9.7, and extra precision (XP) Glide 9.7) were used. The characteristics of the active sites were then described by the conformations of the docking results. In conclusion, a combination of various pharmacophore features and the integration information of structure activity relationship (SAR) can be used to design novel potent PTP1B inhibitors.

  6. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466

  7. Maintenance of a Protein Structure in the Dynamic Evolution of TIMPs over 600 Million Years

    PubMed Central

    Nicosia, Aldo; Maggio, Teresa; Costa, Salvatore; Salamone, Monica; Tagliavia, Marcello; Mazzola, Salvatore; Gianguzza, Fabrizio; Cuttitta, Angela

    2016-01-01

    Deciphering the events leading to protein evolution represents a challenge, especially for protein families showing complex evolutionary history. Among them, TIMPs represent an ancient eukaryotic protein family widely distributed in the animal kingdom. They are known to control the turnover of the extracellular matrix and are considered to arise early during metazoan evolution, arguably tuning essential features of tissue and epithelial organization. To probe the structure and molecular evolution of TIMPs within metazoans, we report the mining and structural characterization of a large data set of TIMPs over approximately 600 Myr. The TIMPs repertoire was explored starting from the Cnidaria phylum, coeval with the origins of connective tissue, to great apes and humans. Despite dramatic sequence differences compared with highest metazoans, the ancestral proteins displayed the canonical TIMP fold. Only small structural changes, represented by an α-helix located in the N-domain, have occurred over the evolution. Both the occurrence of such secondary structure elements and the relative solvent accessibility of the corresponding residues in the three-dimensional structures raises the possibility that these sites represent unconserved element prone to accept variations. PMID:26957029

  8. The application of 3D Zernike moments for the description of "model-free" molecular structure, functional motion, and structural reliability.

    PubMed

    Grandison, Scott; Roberts, Carl; Morris, Richard J

    2009-03-01

    Protein structures are not static entities consisting of equally well-determined atomic coordinates. Proteins undergo continuous motion, and as catalytic machines, these movements can be of high relevance for understanding function. In addition to this strong biological motivation for considering shape changes is the necessity to correctly capture different levels of detail and error in protein structures. Some parts of a structural model are often poorly defined, and the atomic displacement parameters provide an excellent means to characterize the confidence in an atom's spatial coordinates. A mathematical framework for studying these shape changes, and handling positional variance is therefore of high importance. We present an approach for capturing various protein structure properties in a concise mathematical framework that allows us to compare features in a highly efficient manner. We demonstrate how three-dimensional Zernike moments can be employed to describe functions, not only on the surface of a protein but throughout the entire molecule. A number of proof-of-principle examples are given which demonstrate how this approach may be used in practice for the representation of movement and uncertainty.

  9. Second harmonic generation microscopy differentiates collagen type I and type III in COPD

    NASA Astrophysics Data System (ADS)

    Suzuki, Masaru; Kayra, Damian; Elliott, W. Mark; Hogg, James C.; Abraham, Thomas

    2012-03-01

    The structural remodeling of extracellular matrix proteins in peripheral lung region is an important feature in chronic obstructive pulmonary disease (COPD). Multiphoton microscopy is capable of inducing specific second harmonic generation (SHG) signal from non-centrosymmetric structural proteins such as fibrillar collagens. In this study, SHG microscopy was used to examine structural remodeling of the fibrillar collagens in human lungs undergoing emphysematous destruction (n=2). The SHG signals originating from these diseased lung thin sections from base to apex (n=16) were captured simultaneously in both forward and backward directions. We found that the SHG images detected in the forward direction showed well-developed and well-structured thick collagen fibers while the SHG images detected in the backward direction showed striking different morphological features which included the diffused pattern of forward detected structures plus other forms of collagen structures. Comparison of these images with the wellestablished immunohistochemical staining indicated that the structures detected in the forward direction are primarily the thick collagen type I fibers and the structures identified in the backward direction are diffusive structures of forward detected collagen type I plus collagen type III. In conclusion, we here demonstrate the feasibility of SHG microscopy in differentiating fibrillar collagen subtypes and understanding their remodeling in diseased lung tissues.

  10. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.

    PubMed

    Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F

    2017-08-18

    Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.

  11. Defining the conserved internal architecture of a protein kinase.

    PubMed

    Kornev, Alexandr P; Taylor, Susan S

    2010-03-01

    Protein kinases constitute a large protein family of important regulators in all eukaryotic cells. All of the protein kinases have a similar bilobal fold, and their key structural features have been well studied. However, the recent discovery of non-contiguous hydrophobic ensembles inside the protein kinase core shed new light on the internal organization of these molecules. Two hydrophobic "spines" traverse both lobes of the protein kinase molecule, providing a firm but flexible connection between its key elements. The spine model introduces a useful framework for analysis of intramolecular communications, molecular dynamics, and drug design. Published by Elsevier B.V.

  12. Comparative analyses of quaternary arrangements in homo-oligomeric proteins in superfamilies: Functional implications.

    PubMed

    Sudha, Govindarajan; Srinivasan, Narayanaswamy

    2016-09-01

    A comprehensive analysis of the quaternary features of distantly related homo-oligomeric proteins is the focus of the current study. This study has been performed at the levels of quaternary state, symmetry, and quaternary structure. Quaternary state and quaternary structure refers to the number of subunits and spatial arrangements of subunits, respectively. Using a large dataset of available 3D structures of biologically relevant assemblies, we show that only 53% of the distantly related homo-oligomeric proteins have the same quaternary state. Considering these homologous homo-oligomers with the same quaternary state, conservation of quaternary structures is observed only in 38% of the pairs. In 36% of the pairs of distantly related homo-oligomers with different quaternary states the larger assembly in a pair shows high structural similarity with the entire quaternary structure of the related protein with lower quaternary state and it is referred as "Russian doll effect." The differences in quaternary state and structure have been suggested to contribute to the functional diversity. Detailed investigations show that even though the gross functions of many distantly related homo-oligomers are the same, finer level differences in molecular functions are manifested by differences in quaternary states and structures. Comparison of structures of biological assemblies in distantly and closely related homo-oligomeric proteins throughout the study differentiates the effects of sequence divergence on the quaternary structures and function. Knowledge inferred from this study can provide insights for improved protein structure classification and function prediction of homo-oligomers. Proteins 2016; 84:1190-1202. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. A Method for Predicting Protein Complexes from Dynamic Weighted Protein-Protein Interaction Networks.

    PubMed

    Liu, Lizhen; Sun, Xiaowu; Song, Wei; Du, Chao

    2018-06-01

    Predicting protein complexes from protein-protein interaction (PPI) network is of great significance to recognize the structure and function of cells. A protein may interact with different proteins under different time or conditions. Existing approaches only utilize static PPI network data that may lose much temporal biological information. First, this article proposed a novel method that combines gene expression data at different time points with traditional static PPI network to construct different dynamic subnetworks. Second, to further filter out the data noise, the semantic similarity based on gene ontology is regarded as the network weight together with the principal component analysis, which is introduced to deal with the weight computing by three traditional methods. Third, after building a dynamic PPI network, a predicting protein complexes algorithm based on "core-attachment" structural feature is applied to detect complexes from each dynamic subnetworks. Finally, it is revealed from the experimental results that our method proposed in this article performs well on detecting protein complexes from dynamic weighted PPI networks.

  14. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    PubMed

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Conservation and divergence of C-terminal domain structure in the retinoblastoma protein family

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liban, Tyler J.; Medina, Edgar M.; Tripathi, Sarvind

    The retinoblastoma protein (Rb) and the homologous pocket proteins p107 and p130 negatively regulate cell proliferation by binding and inhibiting members of the E2F transcription factor family. The structural features that distinguish Rb from other pocket proteins have been unclear but are critical for understanding their functional diversity and determining why Rb has unique tumor suppressor activities. We describe here important differences in how the Rb and p107 C-terminal domains (CTDs) associate with the coiled-coil and marked-box domains (CMs) of E2Fs. We find that although CTD–CM binding is conserved across protein families, Rb and p107 CTDs show clear preferences formore » different E2Fs. A crystal structure of the p107 CTD bound to E2F5 and its dimer partner DP1 reveals the molecular basis for pocket protein–E2F binding specificity and how cyclin-dependent kinases differentially regulate pocket proteins through CTD phosphorylation. Our structural and biochemical data together with phylogenetic analyses of Rb and E2F proteins support the conclusion that Rb evolved specific structural motifs that confer its unique capacity to bind with high affinity those E2Fs that are the most potent activators of the cell cycle.« less

  16. NMR relaxation studies on the hydrate layer of intrinsically unstructured proteins.

    PubMed

    Bokor, Mónika; Csizmók, Veronika; Kovács, Dénes; Bánki, Péter; Friedrich, Peter; Tompa, Peter; Tompa, Kálmán

    2005-03-01

    Intrinsically unstructured/disordered proteins (IUPs) exist in a disordered and largely solvent-exposed, still functional, structural state under physiological conditions. As their function is often directly linked with structural disorder, understanding their structure-function relationship in detail is a great challenge to structural biology. In particular, their hydration and residual structure, both closely linked with their mechanism of action, require close attention. Here we demonstrate that the hydration of IUPs can be adequately approached by a technique so far unexplored with respect to IUPs, solid-state NMR relaxation measurements. This technique provides quantitative information on various features of hydrate water bound to these proteins. By freezing nonhydrate (bulk) water out, we have been able to measure free induction decays pertaining to protons of bound water from which the amount of hydrate water, its activation energy, and correlation times could be calculated. Thus, for three IUPs, the first inhibitory domain of calpastatin, microtubule-associated protein 2c, and plant dehydrin early responsive to dehydration 10, we demonstrate that they bind a significantly larger amount of water than globular proteins, whereas their suboptimal hydration and relaxation parameters are correlated with their differing modes of function. The theoretical treatment and experimental approach presented in this article may have general utility in characterizing proteins that belong to this novel structural class.

  17. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  18. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  19. Structure-function analysis of the auxilin J-domain reveals an extended Hsc70 interaction interface.

    PubMed

    Jiang, Jianwen; Taylor, Alexander B; Prasad, Kondury; Ishikawa-Brush, Yumiko; Hart, P John; Lafer, Eileen M; Sousa, Rui

    2003-05-20

    J-domains are widespread protein interaction modules involved in recruiting and stimulating the activity of Hsp70 family chaperones. We have determined the crystal structure of the J-domain of auxilin, a protein which is involved in uncoating clathrin-coated vesicles. Comparison to the known structures of J-domains from four other proteins reveals that the auxilin J-domain is the most divergent of all J-domain structures described to date. In addition to the canonical J-domain features described previously, the auxilin J-domain contains an extra N-terminal helix and a long loop inserted between helices I and II. The latter loop extends the positively charged surface which forms the Hsc70 binding site, and is shown by directed mutagenesis and surface plasmon resonance to contain side chains important for binding to Hsc70.

  20. Exploration of the relationship between topology and designability of conformations

    NASA Astrophysics Data System (ADS)

    Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej

    2011-06-01

    Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.

  1. Toward a unified nomenclature for mammalian ADP-ribosyltransferases.

    PubMed

    Hottiger, Michael O; Hassa, Paul O; Lüscher, Bernhard; Schüler, Herwig; Koch-Nolte, Friedrich

    2010-04-01

    ADP-ribosylation is a post-translational modification of proteins catalyzed by ADP-ribosyltransferases. It comprises the transfer of the ADP-ribose moiety from NAD+ to specific amino acid residues on substrate proteins or to ADP-ribose itself. Currently, 22 human genes encoding proteins that possess an ADP-ribosyltransferase catalytic domain are known. Recent structural and enzymological evidence of poly(ADP-ribose)polymerase (PARP) family members demonstrate that earlier proposed names and classifications of these proteins are no longer accurate. Here we summarize these new findings and propose a new consensus nomenclature for all ADP-ribosyltransferases (ARTs) based on the catalyzed reaction and on structural features. A unified nomenclature would facilitate communication between researchers both inside and outside the ADP-ribosylation field. 2009 Elsevier Ltd. All rights reserved.

  2. A Predictive Model of Intein Insertion Site for Use in the Engineering of Molecular Switches

    PubMed Central

    Apgar, James; Ross, Mary; Zuo, Xiao; Dohle, Sarah; Sturtevant, Derek; Shen, Binzhang; de la Vega, Humberto; Lessard, Philip; Lazar, Gabor; Raab, R. Michael

    2012-01-01

    Inteins are intervening protein domains with self-splicing ability that can be used as molecular switches to control activity of their host protein. Successfully engineering an intein into a host protein requires identifying an insertion site that permits intein insertion and splicing while allowing for proper folding of the mature protein post-splicing. By analyzing sequence and structure based properties of native intein insertion sites we have identified four features that showed significant correlation with the location of the intein insertion sites, and therefore may be useful in predicting insertion sites in other proteins that provide native-like intein function. Three of these properties, the distance to the active site and dimer interface site, the SVM score of the splice site cassette, and the sequence conservation of the site showed statistically significant correlation and strong predictive power, with area under the curve (AUC) values of 0.79, 0.76, and 0.73 respectively, while the distance to secondary structure/loop junction showed significance but with less predictive power (AUC of 0.54). In a case study of 20 insertion sites in the XynB xylanase, two features of native insertion sites showed correlation with the splice sites and demonstrated predictive value in selecting non-native splice sites. Structural modeling of intein insertions at two sites highlighted the role that the insertion site location could play on the ability of the intein to modulate activity of the host protein. These findings can be used to enrich the selection of insertion sites capable of supporting intein splicing and hosting an intein switch. PMID:22649521

  3. Raman Spectroscopy Adds Complementary Detail to the High-Resolution X-Ray Crystal Structure of Photosynthetic PsbP from Spinacia oleracea

    PubMed Central

    Lapkouski, Mikalai; Hofbauerova, Katerina; Sovova, Zofie; Ettrichova, Olga; González-Pérez, Sergio; Dulebo, Alexander; Kaftan, David; Kuta Smatanova, Ivana; Revuelta, Jose L.; Arellano, Juan B.; Carey, Jannette; Ettrich, Rüdiger

    2012-01-01

    Raman microscopy permits structural analysis of protein crystals in situ in hanging drops, allowing for comparison with Raman measurements in solution. Nevertheless, the two methods sometimes reveal subtle differences in structure that are often ascribed to the water layer surrounding the protein. The novel method of drop-coating deposition Raman spectropscopy (DCDR) exploits an intermediate phase that, although nominally “dry,” has been shown to preserve protein structural features present in solution. The potential of this new approach to bridge the structural gap between proteins in solution and in crystals is explored here with extrinsic protein PsbP of photosystem II from Spinacia oleracea. In the high-resolution (1.98 Å) x-ray crystal structure of PsbP reported here, several segments of the protein chain are present but unresolved. Analysis of the three kinds of Raman spectra of PsbP suggests that most of the subtle differences can indeed be attributed to the water envelope, which is shown here to have a similar Raman intensity in glassy and crystal states. Using molecular dynamics simulations cross-validated by Raman solution data, two unresolved segments of the PsbP crystal structure were modeled as loops, and the amino terminus was inferred to contain an additional beta segment. The complete PsbP structure was compared with that of the PsbP-like protein CyanoP, which plays a more peripheral role in photosystem II function. The comparison suggests possible interaction surfaces of PsbP with higher-plant photosystem II. This work provides the first complete structural picture of this key protein, and it represents the first systematic comparison of Raman data from solution, glassy, and crystalline states of a protein. PMID:23071614

  4. Assessment of nonenzymatic glycation in protein by FTIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Otero de Joshi, Virginia; Joshi, Narahari V.; Gil, Herminia; Velasquez, William; Contreras, Silvia; Marquez, Glevis

    1999-04-01

    Detection of nonenzymatic glycated proteins is a very significant feature in diabetes, aging and related diseases, therefore we have carried out an FTIR spectroscopic study for glycated and native proteins such as (gamma) -globulin, human serum albumin. For this purpose, commercially available proteins were glycated by a usual procedure and their FTIR spectra were recorded together with that of the native ones. In order to follow the changes in time, (gamma) -globulin was glycated during 1, 2, 3, 5 and 8 weeks and their spectra were recorded. Direct verification was obtained by examining a model unit where the -NH2 group was attached to glucose. The spectrum shows a strong peak at 3500 cm-1 confirming the observed variation in time dependent spectra. The general features of the spectra are very similar and there was no additional structure or change in the peaks. This is understandable as not all the lysine residues are glycated, only a small fraction. Glucose is attached to the (epsilon) -amino group of lysine to form Amadori products, and therefore, the vibrational modes corresponding to the (epsilon) -NH2 unit of lysine are expected to be altered. This region exactly lies in the Amide I region of protein structure. Careful investigation of this part, indeed, shows a complex structure originated from alternations of -NH2 group. Thus, the present investigation indicates that an optical approach could be a rapid and effective method to identify the nonenzymatic glycation process.

  5. Crysalis: an integrated server for computational analysis and design of protein crystallization.

    PubMed

    Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I; Lin, Donghai; Song, Jiangning

    2016-02-24

    The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/.

  6. Crysalis: an integrated server for computational analysis and design of protein crystallization

    PubMed Central

    Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I.; Lin, Donghai; Song, Jiangning

    2016-01-01

    The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/. PMID:26906024

  7. Biological features of hepatitis B virus isolates from patients based on full-length genomic analysis.

    PubMed

    Wen, Yu-Mei; Wang, Yong-Xiang

    2009-01-01

    The mechanisms for HBV persistence and the pathogenesis of chronic HB have been shown mainly due to defects in host immune responses. However, HBV isolates with different biological features may also contribute to different clinical outcomes and epidemiological implications in viral hepatitis B (HB). This review presents interesting biological features of HBV isolates based on the structural and functional analysis of full-length HBV isolates from various patients. Among isolates from children after failure of HB vaccination, 129L mutant at the 'a' determinant was found with normal binding efficiency to anti-HBs, but with reduced immunogenicity, which could initiate persistent HBV infections. Isolates from fulminant hepatitis (FH) B patients were not all highly replicative, but differences in capacities of anti-HBs induction could be involved in the pathogenesis of FH. The high replicative competency of isolates from hepatocellular carcinoma (HCC) patients could result in enhanced immune-mediated cytopathic effects against HBV viral proteins, and increased transactivating activity by the X protein. The mechanism of a double-spliced variant in enhancing replication of the wild-type virus is presented. The importance of integrating structural and functional analysis to reveal biological features of HBV isolates in viral pathogenesis is discussed.

  8. Structural Similarities and Differences between Two Functionally Distinct SecA Proteins, Mycobacterium tuberculosis SecA1 and SecA2

    PubMed Central

    Swanson, Stephanie; Ioerger, Thomas R.; Rigel, Nathan W.; Miller, Brittany K.; Braunstein, Miriam

    2015-01-01

    ABSTRACT While SecA is the ATPase component of the major bacterial secretory (Sec) system, mycobacteria and some Gram-positive pathogens have a second paralog, SecA2. In bacteria with two SecA paralogs, each SecA is functionally distinct, and they cannot compensate for one another. Compared to SecA1, SecA2 exports a distinct and smaller set of substrates, some of which have roles in virulence. In the mycobacterial system, some SecA2-dependent substrates lack a signal peptide, while others contain a signal peptide but possess features in the mature protein that necessitate a role for SecA2 in their export. It is unclear how SecA2 functions in protein export, and one open question is whether SecA2 works with the canonical SecYEG channel to export proteins. In this study, we report the structure of Mycobacterium tuberculosis SecA2 (MtbSecA2), which is the first structure of any SecA2 protein. A high level of structural similarity is observed between SecA2 and SecA1. The major structural difference is the absence of the helical wing domain, which is likely to play a role in how MtbSecA2 recognizes its unique substrates. Importantly, structural features critical to the interaction between SecA1 and SecYEG are preserved in SecA2. Furthermore, suppressor mutations of a dominant-negative secA2 mutant map to the surface of SecA2 and help identify functional regions of SecA2 that may promote interactions with SecYEG or the translocating polypeptide substrate. These results support a model in which the mycobacterial SecA2 works with SecYEG. IMPORTANCE SecA2 is a paralog of SecA1, which is the ATPase of the canonical bacterial Sec secretion system. SecA2 has a nonredundant function with SecA1, and SecA2 exports a distinct and smaller set of substrates than SecA1. This work reports the crystal structure of SecA2 of Mycobacterium tuberculosis (the first SecA2 structure reported for any organism). Many of the structural features of SecA1 are conserved in the SecA2 structure, including putative contacts with the SecYEG channel. Several structural differences are also identified that could relate to the unique function and selectivity of SecA2. Suppressor mutations of a secA2 mutant map to the surface of SecA2 and help identify functional regions of SecA2 that may promote interactions with SecYEG. PMID:26668263

  9. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  10. Genetics Home Reference: mevalonate kinase deficiency

    MedlinePlus

    ... cell maturation (differentiation), formation of the cell's structural framework (the cytoskeleton), gene activity (expression), and protein production ... Group. Long-term follow-up, clinical features, and quality of life in a series of 103 patients ...

  11. Protein Tyrosine Nitration: Biochemical Mechanisms and Structural Basis of its Functional Effects

    PubMed Central

    Radi, Rafael

    2012-01-01

    CONSPECTUS The nitration of protein tyrosine residues to 3-nitrotyrosine represents an oxidative postranslational modification that unveils the disruption of nitric oxide (•NO) signaling and metabolism towards pro-oxidant processes. Indeed, excess levels of reactive oxygen species in the presence of •NO or •NO-derived metabolites lead to the formation of nitrating species such as peroxynitrite. Thus, protein 3-nitrotyrosine has been established as a biomarker of cell, tissue and systemic “nitroxidative stress”. Moreover, tyrosine nitration modifies key properties of the amino acid (i.e. phenol group pKa, redox potential, hydrophobicity and volume). Thus, the incorporation of a nitro group (−NO2) to protein tyrosines can lead to profound structural and functional changes, some of which contribute to altered cell and tissue homeostasis. In this Account, I describe our current efforts to define 1) biologically-relevant mechanisms of protein tyrosine nitration and 2) how this modification can cause changes in protein structure and function at the molecular level. First, the relevance of protein tyrosine nitration via free radical-mediated reactions (in both peroxynitrite-dependent or independent pathways) involving the intermediacy of tyrosyl radical (Tyr•) will be underscored. This feature of the nitration process becomes critical as Tyr• can take variable fates, including the formation of 3-nitrotyrosine. Fast kinetic techniques, electron paramagnetic resonance (EPR) studies, bioanalytical methods and kinetic simulations have altogether assisted to characterize and fingerprint the reactions of tyrosine with peroxynitrite and one-electron oxidants and its further evolution to 3-nitrotyrosine. Recent findings show that nitration of tyrosines in proteins associated to biomembranes is linked to the lipid peroxidation process via a connecting reaction that involves the one-electron oxidation of tyrosine by lipid peroxyl radicals (LOO•). Second, immunochemical and proteomic-based studies indicate that protein tyrosine nitration is a selective process in vitro and in vivo, preferentially directed to a subset of proteins, and within those proteins, typically one or two tyrosine residues are site-specifically modified. The nature and site(s) of formation of the proximal oxidizing/nitrating species, the physico-chemical characteristics of the local microenvironment and also structural features of the protein account for part of this selectivity. Then, how this relatively subtle chemical modification in one tyrosine residue can sometimes cause dramatic changes in protein activity has remained elusive. Herein, I will analyze recent structural biology data of two pure and homogenously nitrated mitochondrial proteins (i.e. cytochrome c and MnSOD) to illustrate regio-selectivity and structural effects of tyrosine nitration, and subsequent impact in protein loss- or even gain-of-function. PMID:23157446

  12. How the Sequence of a Gene Specifies Structural Symmetry in Proteins

    PubMed Central

    Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

    2015-01-01

    Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668

  13. Structural test of the parameterized-backbone method for protein design.

    PubMed

    Plecs, Joseph J; Harbury, Pehr B; Kim, Peter S; Alber, Tom

    2004-09-03

    Designing new protein folds requires a method for simultaneously optimizing the conformation of the backbone and the side-chains. One approach to this problem is the use of a parameterized backbone, which allows the systematic exploration of families of structures. We report the crystal structure of RH3, a right-handed, three-helix coiled coil that was designed using a parameterized backbone and detailed modeling of core packing. This crystal structure was determined using another rationally designed feature, a metal-binding site that permitted experimental phasing of the X-ray data. RH3 adopted the intended fold, which has not been observed previously in biological proteins. Unanticipated structural asymmetry in the trimer was a principal source of variation within the RH3 structure. The sequence of RH3 differs from that of a previously characterized right-handed tetramer, RH4, at only one position in each 11 amino acid sequence repeat. This close similarity indicates that the design method is sensitive to the core packing interactions that specify the protein structure. Comparison of the structures of RH3 and RH4 indicates that both steric overlap and cavity formation provide strong driving forces for oligomer specificity.

  14. Universal features of fluctuations in globular proteins.

    PubMed

    Erman, Burak

    2016-06-01

    Using data from 2000 non-homologous protein crystal structures, we show that the distribution of residue B factors of proteins collapses onto a single master curve. We show by maximum entropy arguments that this curve is a Gamma function whose order and dispersion are obtained from experimental data. The distribution for any given specific protein can be generated from the master curve by a linear transformation. Any perturbation of the B factor distribution of a protein, imposed at constant energy, causes a decrease in the entropy of the protein relative to that of the reference state. Proteins 2016; 84:721-725. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Ultrafast protein structure-based virtual screening with Panther

    NASA Astrophysics Data System (ADS)

    Niinivehmas, Sanna P.; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T.

    2015-10-01

    Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes <1 s. The presented Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.

  16. Ultrafast protein structure-based virtual screening with Panther.

    PubMed

    Niinivehmas, Sanna P; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T

    2015-10-01

    Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes <1 s. The presented Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.

  17. KFC Server: interactive forecasting of protein interaction hot spots

    PubMed Central

    Darnell, Steven J.; LeGault, Laura; Mitchell, Julie C.

    2008-01-01

    The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model—a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein–protein or protein–DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org. PMID:18539611

  18. Automated Interpretation of Subcellular Patterns in Fluorescence Microscope Images for Location Proteomics

    PubMed Central

    Chen, Xiang; Velliste, Meel; Murphy, Robert F.

    2010-01-01

    Proteomics, the large scale identification and characterization of many or all proteins expressed in a given cell type, has become a major area of biological research. In addition to information on protein sequence, structure and expression levels, knowledge of a protein’s subcellular location is essential to a complete understanding of its functions. Currently subcellular location patterns are routinely determined by visual inspection of fluorescence microscope images. We review here research aimed at creating systems for automated, systematic determination of location. These employ numerical feature extraction from images, feature reduction to identify the most useful features, and various supervised learning (classification) and unsupervised learning (clustering) methods. These methods have been shown to perform significantly better than human interpretation of the same images. When coupled with technologies for tagging large numbers of proteins and high-throughput microscope systems, the computational methods reviewed here enable the new subfield of location proteomics. This subfield will make critical contributions in two related areas. First, it will provide structured, high-resolution information on location to enable Systems Biology efforts to simulate cell behavior from the gene level on up. Second, it will provide tools for Cytomics projects aimed at characterizing the behaviors of all cell types before, during and after the onset of various diseases. PMID:16752421

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gupta, Preeti; Deep, Shashank, E-mail: sdeep@chemistry.iitd.ac.in

    Highlights: • HCAII forms amyloid-like aggregates at moderate concentration of trifluoroethanol. • Protein adopts a state between β-sheet and α-helix at moderate % of TFE. • Hydrophobic surface(s) of partially structured conformation forms amyloid. • High % of TFE induces stable α-helical state preventing aggregation. - Abstract: In the present work, we examined the correlation between 2,2,2-trifluoroethanol (TFE)-induced conformational transitions of human carbonic anhydrase II (HCAII) and its aggregation propensity. Circular dichroism data indicates that protein undergoes a transition from β-sheet to α-helix on addition of TFE. The protein was found to aggregate maximally at moderate concentration of TFE atmore » which it exists somewhere between β-sheet and α-helix, probably in extended non-native β-sheet conformation. Thioflavin-T (ThT) and Congo-Red (CR) assays along with fluorescence microscopy and transmission electron microscopy (TEM) data suggest that the protein aggregates induced by TFE possess amyloid-like features. Anilino-8-naphthalene sulfonate (ANS) binding studies reveal that the exposure of hydrophobic surface(s) was maximum in intermediate conformation. Our study suggests that the exposed hydrophobic surface and/or the disruption of the structural features protecting a β-sheet protein might be the major reason(s) for the high aggregation propensity of non-native intermediate conformation of HCAII.« less

  20. Cryo-EM of dynamic protein complexes in eukaryotic DNA replication.

    PubMed

    Sun, Jingchuan; Yuan, Zuanning; Bai, Lin; Li, Huilin

    2017-01-01

    DNA replication in Eukaryotes is a highly dynamic process that involves several dozens of proteins. Some of these proteins form stable complexes that are amenable to high-resolution structure determination by cryo-EM, thanks to the recent advent of the direct electron detector and powerful image analysis algorithm. But many of these proteins associate only transiently and flexibly, precluding traditional biochemical purification. We found that direct mixing of the component proteins followed by 2D and 3D image sorting can capture some very weakly interacting complexes. Even at 2D average level and at low resolution, EM images of these flexible complexes can provide important biological insights. It is often necessary to positively identify the feature-of-interest in a low resolution EM structure. We found that systematically fusing or inserting maltose binding protein (MBP) to selected proteins is highly effective in these situations. In this chapter, we describe the EM studies of several protein complexes involved in the eukaryotic DNA replication over the past decade or so. We suggest that some of the approaches used in these studies may be applicable to structural analysis of other biological systems. © 2016 The Protein Society.

  1. PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.

    PubMed

    Mir, Saqib; Alhroub, Younes; Anyango, Stephen; Armstrong, David R; Berrisford, John M; Clark, Alice R; Conroy, Matthew J; Dana, Jose M; Deshpande, Mandar; Gupta, Deepti; Gutmanas, Aleksandras; Haslam, Pauline; Mak, Lora; Mukhopadhyay, Abhik; Nadzirin, Nurul; Paysan-Lafosse, Typhaine; Sehnal, David; Sen, Sanchayita; Smart, Oliver S; Varadi, Mihaly; Kleywegt, Gerard J; Velankar, Sameer

    2018-01-04

    The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Binding Mechanisms of Intrinsically Disordered Proteins: Theory, Simulation, and Experiment

    PubMed Central

    Mollica, Luca; Bessa, Luiza M.; Hanoulle, Xavier; Jensen, Malene Ringkjøbing; Blackledge, Martin; Schneider, Robert

    2016-01-01

    In recent years, protein science has been revolutionized by the discovery of intrinsically disordered proteins (IDPs). In contrast to the classical paradigm that a given protein sequence corresponds to a defined structure and an associated function, we now know that proteins can be functional in the absence of a stable three-dimensional structure. In many cases, disordered proteins or protein regions become structured, at least locally, upon interacting with their physiological partners. Many, sometimes conflicting, hypotheses have been put forward regarding the interaction mechanisms of IDPs and the potential advantages of disorder for protein-protein interactions. Whether disorder may increase, as proposed, e.g., in the “fly-casting” hypothesis, or decrease binding rates, increase or decrease binding specificity, or what role pre-formed structure might play in interactions involving IDPs (conformational selection vs. induced fit), are subjects of intense debate. Experimentally, these questions remain difficult to address. Here, we review experimental studies of binding mechanisms of IDPs using NMR spectroscopy and transient kinetic techniques, as well as the underlying theoretical concepts and numerical methods that can be applied to describe these interactions at the atomic level. The available literature suggests that the kinetic and thermodynamic parameters characterizing interactions involving IDPs can vary widely and that there may be no single common mechanism that can explain the different binding modes observed experimentally. Rather, disordered proteins appear to make combined use of features such as pre-formed structure and flexibility, depending on the individual system and the functional context. PMID:27668217

  3. deepNF: Deep network fusion for protein function prediction.

    PubMed

    Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard

    2018-06-01

    The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.

  4. Protein-based materials, toward a new level of structural control.

    PubMed

    van Hest, J C; Tirrell, D A

    2001-10-07

    Through billions of years of evolution nature has created and refined structural proteins for a wide variety of specific purposes. Amino acid sequences and their associated folding patterns combine to create elastic, rigid or tough materials. In many respects, nature's intricately designed products provide challenging examples for materials scientists, but translation of natural structural concepts into bio-inspired materials requires a level of control of macromolecular architecture far higher than that afforded by conventional polymerization processes. An increasingly important approach to this problem has been to use biological systems for production of materials. Through protein engineering, artificial genes can be developed that encode protein-based materials with desired features. Structural elements found in nature, such as beta-sheets and alpha-helices, can be combined with great flexibility, and can be outfitted with functional elements such as cell binding sites or enzymatic domains. The possibility of incorporating non-natural amino acids increases the versatility of protein engineering still further. It is expected that such methods will have large impact in the field of materials science, and especially in biomedical materials science, in the future.

  5. Atomic view of the histidine environment stabilizing higher-pH conformations of pH-dependent proteins

    PubMed Central

    Valéry, Céline; Deville-Foillard, Stéphanie; Lefebvre, Christelle; Taberner, Nuria; Legrand, Pierre; Meneau, Florian; Meriadec, Cristelle; Delvaux, Camille; Bizien, Thomas; Kasotakis, Emmanouil; Lopez-Iglesias, Carmen; Gall, Andrew; Bressanelli, Stéphane; Le Du, Marie-Hélène; Paternostre, Maïté; Artzner, Franck

    2015-01-01

    External stimuli are powerful tools that naturally control protein assemblies and functions. For example, during viral entry and exit changes in pH are known to trigger large protein conformational changes. However, the molecular features stabilizing the higher pH structures remain unclear. Here we elucidate the conformational change of a self-assembling peptide that forms either small or large nanotubes dependent on the pH. The sub-angstrom high-pH peptide structure reveals a globular conformation stabilized through a strong histidine-serine H-bond and a tight histidine-aromatic packing. Lowering the pH induces histidine protonation, disrupts these interactions and triggers a large change to an extended β-sheet-based conformation. Re-visiting available structures of proteins with pH-dependent conformations reveals both histidine-containing aromatic pockets and histidine-serine proximity as key motifs in higher pH structures. The mechanism discovered in this study may thus be generally used by pH-dependent proteins and opens new prospects in the field of nanomaterials. PMID:26190377

  6. Probing Protein Structure in Vivo with FRET

    PubMed Central

    Davis, Trisha; Muller, Eric

    2012-01-01

    Fluorescence resonance energy transfer (FRET) is widely used to construct probes for cellular activities and to complement two-hybrid results that predict protein-protein interactions. The Yeast Resource Center promotes an underutilized potential of FRET as an in vivo tool to position proteins within low resolution structures derived from electron microscopy. The success of this approach using widefield microscopy depends upon the choice of filter sets, standardized image acquisition, a robust metric and controls matched to the structure under investigation. A comparison of various CFP and YFP filter combinations from Chroma and Semrock demonstrated the strength of the Chroma filters when coupled with our FRET metric, termed FretR. Coupling CFP and YFP to a selection of proteins of known structure allowed us to create a standard curve of FretR versus distance. How well other FRET metrics conform was also evaluated. Finally FretR was linked to an approximation of the efficiency of energy transfer. Together this feature set has allowed us to contribute to our understanding of the organization of the yeast spindle pole body, cohesin complex and gamma-tubulin complex.

  7. Feline coronavirus: Insights into viral pathogenesis based on the spike protein structure and function.

    PubMed

    Jaimes, Javier A; Whittaker, Gary R

    2018-04-01

    Feline coronavirus (FCoV) is an etiological agent that causes a benign enteric illness and the fatal systemic disease feline infectious peritonitis (FIP). The FCoV spike (S) protein is considered the viral regulator for binding and entry to the cell. This protein is also involved in FCoV tropism and virulence, as well as in the switch from enteric disease to FIP. This regulation is carried out by spike's major functions: receptor binding and virus-cell membrane fusion. In this review, we address important aspects in FCoV genetics, replication and pathogenesis, focusing on the role of S. To better understand this, FCoV S protein models were constructed, based on the human coronavirus NL63 (HCoV-NL63) S structure. We describe the specific structural characteristics of the FCoV S, in comparison with other coronavirus spikes. We also revise the biochemical events needed for FCoV S activation and its relation to the structural features of the protein. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Investigating the importance of Delaunay-based definition of atomic interactions in scoring of protein-protein docking results.

    PubMed

    Jafari, Rahim; Sadeghi, Mehdi; Mirzaie, Mehdi

    2016-05-01

    The approaches taken to represent and describe structural features of the macromolecules are of major importance when developing computational methods for studying and predicting their structures and interactions. This study attempts to explore the significance of Delaunay tessellation for the definition of atomic interactions by evaluating its impact on the performance of scoring protein-protein docking prediction. Two sets of knowledge-based scoring potentials are extracted from a training dataset of native protein-protein complexes. The potential of the first set is derived using atomic interactions extracted from Delaunay tessellated structures. The potential of the second set is calculated conventionally, that is, using atom pairs whose interactions were determined by their separation distances. The scoring potentials were tested against two different docking decoy sets and their performances were compared. The results show that, if properly optimized, the Delaunay-based scoring potentials can achieve higher success rate than the usual scoring potentials. These results and the results of a previous study on the use of Delaunay-based potentials in protein fold recognition, all point to the fact that Delaunay tessellation of protein structure can provide a more realistic definition of atomic interaction, and therefore, if appropriately utilized, may be able to improve the accuracy of pair potentials. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. 3dRPC: a web server for 3D RNA-protein structure prediction.

    PubMed

    Huang, Yangyu; Li, Haotian; Xiao, Yi

    2018-04-01

    RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.

  10. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

    PubMed

    Brunak, S; Engelbrecht, J

    1996-06-01

    A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.

  11. Building protein-protein interaction networks for Leishmania species through protein structural information.

    PubMed

    Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro

    2018-03-06

    Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.

  12. Atomic Structure of GRK5 Reveals Distinct Structural Features Novel for G Protein-coupled Receptor Kinases.

    PubMed

    Komolov, Konstantin E; Bhardwaj, Anshul; Benovic, Jeffrey L

    2015-08-21

    G protein-coupled receptor kinases (GRKs) are members of the protein kinase A, G, and C families (AGC) and play a central role in mediating G protein-coupled receptor phosphorylation and desensitization. One member of the family, GRK5, has been implicated in several human pathologies, including heart failure, hypertension, cancer, diabetes, and Alzheimer disease. To gain mechanistic insight into GRK5 function, we determined a crystal structure of full-length human GRK5 at 1.8 Å resolution. GRK5 in complex with the ATP analog 5'-adenylyl β,γ-imidodiphosphate or the nucleoside sangivamycin crystallized as a monomer. The C-terminal tail (C-tail) of AGC kinase domains is a highly conserved feature that is divided into three segments as follows: the C-lobe tether, the active-site tether (AST), and the N-lobe tether (NLT). This domain is fully resolved in GRK5 and reveals novel interactions with the nucleotide and N-lobe. Similar to other AGC kinases, the GRK5 AST is an integral part of the nucleotide-binding pocket, a feature not observed in other GRKs. The AST also mediates contact between the kinase N- and C-lobes facilitating closure of the kinase domain. The GRK5 NLT is largely displaced from its previously observed position in other GRKs. Moreover, although the autophosphorylation sites in the NLT are >20 Å away from the catalytic cleft, they are capable of rapid cis-autophosphorylation suggesting high mobility of this region. In summary, we provide a snapshot of GRK5 in a partially closed state, where structural elements of the kinase domain C-tail are aligned to form novel interactions to the nucleotide and N-lobe not previously observed in other GRKs. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. Integrative topological analysis of mass spectrometry data reveals molecular features with clinical relevance in esophageal squamous cell carcinoma

    PubMed Central

    Gao, She-Gan; Liu, Rui-Min; Zhao, Yun-Gang; Wang, Pei; Ward, Douglas G.; Wang, Guang-Chao; Guo, Xiang-Qian; Gu, Juan; Niu, Wan-Bin; Zhang, Tian; Martin, Ashley; Guo, Zhi-Peng; Feng, Xiao-Shan; Qi, Yi-Jun; Ma, Yuan-Fang

    2016-01-01

    Combining MS-based proteomic data with network and topological features of such network would identify more clinically relevant molecules and meaningfully expand the repertoire of proteins derived from MS analysis. The integrative topological indexes representing 95.96% information of seven individual topological measures of node proteins were calculated within a protein-protein interaction (PPI) network, built using 244 differentially expressed proteins (DEPs) identified by iTRAQ 2D-LC-MS/MS. Compared with DEPs, differentially expressed genes (DEGs) and comprehensive features (CFs), structurally dominant nodes (SDNs) based on integrative topological index distribution produced comparable classification performance in three different clinical settings using five independent gene expression data sets. The signature molecules of SDN-based classifier for distinction of early from late clinical TNM stages were enriched in biological traits of protein synthesis, intracellular localization and ribosome biogenesis, which suggests that ribosome biogenesis represents a promising therapeutic target for treating ESCC. In addition, ITGB1 expression selected exclusively by integrative topological measures correlated with clinical stages and prognosis, which was further validated with two independent cohorts of ESCC samples. Thus the integrative topological analysis of PPI networks proposed in this study provides an alternative approach to identify potential biomarkers and therapeutic targets from MS/MS data with functional insights in ESCC. PMID:26898710

  14. Extensive Evolution of Cereal Ribosome-Inactivating Proteins Translates into Unique Structural Features, Activation Mechanisms, and Physiological Roles

    PubMed Central

    De Zaeytijd, Jeroen; Van Damme, Els J. M.

    2017-01-01

    Ribosome-inactivating proteins (RIPs) are a class of cytotoxic enzymes that can depurinate rRNAs thereby inhibiting protein translation. Although these proteins have also been detected in bacteria, fungi, and even some insects, they are especially prevalent in the plant kingdom. This review focuses on the RIPs from cereals. Studies on the taxonomical distribution and evolution of plant RIPs suggest that cereal RIPs have evolved at an enhanced rate giving rise to a large and heterogeneous RIP gene family. Furthermore, several cereal RIP genes are characterized by a unique domain architecture and the lack of a signal peptide. This advanced evolution of cereal RIPs translates into distinct structures, activation mechanisms, and physiological roles. Several cereal RIPs are characterized by activation mechanisms that include the proteolytic removal of internal peptides from the N-glycosidase domain, a feature not documented for non-cereal RIPs. Besides their role in defense against pathogenic fungi or herbivorous insects, cereal RIPs are also involved in endogenous functions such as adaptation to abiotic stress, storage, induction of senescence, and reprogramming of the translational machinery. The unique properties of cereal RIPs are discussed in this review paper. PMID:28353660

  15. Histone Variants and Composition in the Developing Brain: Should MeCP2 Care?

    PubMed

    Zago, Valentina; Pinar-CabezaDeVaca, Cristina; Vincent, John B; Ausio, Juan

    2017-01-01

    Specific compositional chromatin features distinguish brain/neuronal chromatin from that of other tissues and are critical to this organ and cell type development and neuroplasticity. These features include a significant turnover of the major constitutive chromosomal proteins, including the (canonical) replication-dependent histones, the replication-independent replacement histone variants, as well as the chromatin associated transcriptional regulator MeCP2 (methyl CpG binding protein 2). Alterations of histones and MeCP2 have already been implicated in many brain disorders. Despite the relevance of histone variants to chromatin structure and function, only recently has some exciting literature started to re-emerge that directly relates them to neuron plasticity and cognition. However, the amount of information available on the functional role of these histones is still very limited. The purpose of this review is to focus attention to this important group of chromatin proteins, which, in the brain, possess overlapping structural and functional roles with the highly abundant presence of MeCP2. There is an imperative need to understand how all these proteins communicate with each other, and future research will hopefully provide us with answers.

  16. Automated Glycan Assembly of Oligosaccharides Related to Arabinogalactan Proteins.

    PubMed

    Bartetzko, Max P; Schuhmacher, Frank; Hahm, Heung Sik; Seeberger, Peter H; Pfrengle, Fabian

    2015-09-04

    Arabinogalactan proteins are heavily glycosylated proteoglycans in plants. Their glycan portion consists of type-II arabinogalactan polysaccharides whose heterogeneity hampers the assignment of the arabinogalactan protein function. Synthetic chemistry is key to the procurement of molecular probes for plant biologists. Described is the automated glycan assembly of 14 oligosaccharides from four monosaccharide building blocks. These linear and branched glycans represent key structural features of natural type-II arabinogalactans and will serve as tools for arabinogalactan biology.

  17. Improved data visualization techniques for analyzing macromolecule structural changes

    PubMed Central

    Kim, Jae Hyun; Iyer, Vidyashankara; Joshi, Sangeeta B; Volkin, David B; Middaugh, C Russell

    2012-01-01

    The empirical phase diagram (EPD) is a colored representation of overall structural integrity and conformational stability of macromolecules in response to various environmental perturbations. Numerous proteins and macromolecular complexes have been analyzed by EPDs to summarize results from large data sets from multiple biophysical techniques. The current EPD method suffers from a number of deficiencies including lack of a meaningful relationship between color and actual molecular features, difficulties in identifying contributions from individual techniques, and a limited ability to be interpreted by color-blind individuals. In this work, three improved data visualization approaches are proposed as techniques complementary to the EPD. The secondary, tertiary, and quaternary structural changes of multiple proteins as a function of environmental stress were first measured using circular dichroism, intrinsic fluorescence spectroscopy, and static light scattering, respectively. Data sets were then visualized as (1) RGB colors using three-index EPDs, (2) equiangular polygons using radar charts, and (3) human facial features using Chernoff face diagrams. Data as a function of temperature and pH for bovine serum albumin, aldolase, and chymotrypsin as well as candidate protein vaccine antigens including a serine threonine kinase protein (SP1732) and surface antigen A (SP1650) from S. pneumoniae and hemagglutinin from an H1N1 influenza virus are used to illustrate the advantages and disadvantages of each type of data visualization technique. PMID:22898970

  18. Peroxisome protein import: a complex journey.

    PubMed

    Baker, Alison; Lanyon-Hogg, Thomas; Warriner, Stuart L

    2016-06-15

    The import of proteins into peroxisomes possesses many unusual features such as the ability to import folded proteins, and a surprising diversity of targeting signals with differing affinities that can be recognized by the same receptor. As understanding of the structure and function of many components of the protein import machinery has grown, an increasingly complex network of factors affecting each step of the import pathway has emerged. Structural studies have revealed the presence of additional interactions between cargo proteins and the PEX5 receptor that affect import potential, with a subtle network of cargo-induced conformational changes in PEX5 being involved in the import process. Biochemical studies have also indicated an interdependence of receptor-cargo import with release of unloaded receptor from the peroxisome. Here, we provide an update on recent literature concerning mechanisms of protein import into peroxisomes. © 2016 The Author(s).

  19. A TALE-inspired computational screen for proteins that contain approximate tandem repeats.

    PubMed

    Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias

    2017-01-01

    TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.

  20. A TALE-inspired computational screen for proteins that contain approximate tandem repeats

    PubMed Central

    Krwawicz, Joanna

    2017-01-01

    TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832

Top