LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell
Sperschneider, Jana; Catanzariti, Ann-Maree; DeBoer, Kathleen; Petre, Benjamin; Gardiner, Donald M.; Singh, Karam B.; Dodds, Peter N.; Taylor, Jennifer M.
2017-01-01
Pathogens secrete effector proteins and many operate inside plant cells to enable infection. Some effectors have been found to enter subcellular compartments by mimicking host targeting sequences. Although many computational methods exist to predict plant protein subcellular localization, they perform poorly for effectors. We introduce LOCALIZER for predicting plant and effector protein localization to chloroplasts, mitochondria, and nuclei. LOCALIZER shows greater prediction accuracy for chloroplast and mitochondrial targeting compared to other methods for 652 plant proteins. For 107 eukaryotic effectors, LOCALIZER outperforms other methods and predicts a previously unrecognized chloroplast transit peptide for the ToxA effector, which we show translocates into tobacco chloroplasts. Secretome-wide predictions and confocal microscopy reveal that rust fungi might have evolved multiple effectors that target chloroplasts or nuclei. LOCALIZER is the first method for predicting effector localisation in plants and is a valuable tool for prioritizing effector candidates for functional investigations. LOCALIZER is available at http://localizer.csiro.au/. PMID:28300209
PLPD: reliable protein localization prediction from imbalanced and overlapped datasets
Lee, KiYoung; Kim, Dae-Won; Na, DoKyun; Lee, Kwang H.; Lee, Doheon
2006-01-01
Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003). PMID:16966337
Khan, Abdul Arif; Khan, Zakir; Kalam, Mohd Abul; Khan, Azmat Ali
2018-01-01
Microbial pathogenesis involves several aspects of host-pathogen interactions, including microbial proteins targeting host subcellular compartments and subsequent effects on host physiology. Such studies are supported by experimental data, but recent detection of bacterial proteins localization through computational eukaryotic subcellular protein targeting prediction tools has also come into practice. We evaluated inter-kingdom prediction certainty of these tools. The bacterial proteins experimentally known to target host subcellular compartments were predicted with eukaryotic subcellular targeting prediction tools, and prediction certainty was assessed. The results indicate that these tools alone are not sufficient for inter-kingdom protein targeting prediction. The correct prediction of pathogen's protein subcellular targeting depends on several factors, including presence of localization signal, transmembrane domain and molecular weight, etc., in addition to approach for subcellular targeting prediction. The detection of protein targeting in endomembrane system is comparatively difficult, as the proteins in this location are channelized to different compartments. In addition, the high specificity of training data set also creates low inter-kingdom prediction accuracy. Current data can help to suggest strategy for correct prediction of bacterial protein's subcellular localization in host cell. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Yu, Bin; Li, Shan; Qiu, Wen-Ying; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Wang, Ming-Hui; Zhang, Yan
2017-12-08
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.
Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Wang, Ming-Hui; Zhang, Yan
2017-01-01
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research. PMID:29296195
Protein subcellular localization prediction using artificial intelligence technology.
Nair, Rajesh; Rost, Burkhard
2008-01-01
Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information
Kumar, Ravindra; Jain, Sohni; Kumari, Bandana; Kumar, Manish
2014-01-01
The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server. PMID:24897370
Binding ligand prediction for proteins using partial matching of local surface patches.
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188
Semi-supervised protein subcellular localization.
Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang
2009-01-30
Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.
Mills, Evan; Truong, Kevin
2009-06-01
Protein localization is an important regulatory mechanism in many cell signaling pathways such as cytoskeletal organization and genetic regulation. The specific mechanism of protein localization determines the kinetics and morphological constraints of protein translocation, and thus affects the rate and extent of localization. To investigate the affect of localization kinetics and morphology on protein localization, we designed a protein localization system based on Ca(2+)-calmodulin and Src homology 3 domain binding peptides that can translocate between specific localizations in response to a Ca(2+) signal. We used a stochastic biomolecular simulator to predict that such a protein localization system will exhibit slower and less complete translocations when the association kinetics of a binding domain and peptide are reduced. As well, we predicted that increasing the diffusion resistance by manipulating the morphology of the system would similarly impair translocation speed and completeness. We then constructed a network of synthetic fusion proteins and showed that these predictions could be qualitatively confirmed in vitro. This work provides a basis for explaining the different characteristics (rate and extent) of protein transport and localization in cells as a consequence of the kinetics and morphology of the transport mechanism.
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
2014-01-01
Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set. PMID:24646119
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.
Simha, Ramanuja; Shatkay, Hagit
2014-03-19
Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Hasan, Md Al Mehedi; Ahmad, Shamim; Molla, Md Khademul Islam
2017-03-28
Predicting the subcellular locations of proteins can provide useful hints that reveal their functions, increase our understanding of the mechanisms of some diseases, and finally aid in the development of novel drugs. As the number of newly discovered proteins has been growing exponentially, which in turns, makes the subcellular localization prediction by purely laboratory tests prohibitively laborious and expensive. In this context, to tackle the challenges, computational methods are being developed as an alternative choice to aid biologists in selecting target proteins and designing related experiments. However, the success of protein subcellular localization prediction is still a complicated and challenging issue, particularly, when query proteins have multi-label characteristics, i.e., if they exist simultaneously in more than one subcellular location or if they move between two or more different subcellular locations. To date, to address this problem, several types of subcellular localization prediction methods with different levels of accuracy have been proposed. The support vector machine (SVM) has been employed to provide potential solutions to the protein subcellular localization prediction problem. However, the practicability of an SVM is affected by the challenges of selecting an appropriate kernel and selecting the parameters of the selected kernel. To address this difficulty, in this study, we aimed to develop an efficient multi-label protein subcellular localization prediction system, named as MKLoc, by introducing multiple kernel learning (MKL) based SVM. We evaluated MKLoc using a combined dataset containing 5447 single-localized proteins (originally published as part of the Höglund dataset) and 3056 multi-localized proteins (originally published as part of the DBMLoc set). Note that this dataset was used by Briesemeister et al. in their extensive comparison of multi-localization prediction systems. Finally, our experimental results indicate that MKLoc not only achieves higher accuracy than a single kernel based SVM system but also shows significantly better results than those obtained from other top systems (MDLoc, BNCs, YLoc+). Moreover, MKLoc requires less computation time to tune and train the system than that required for BNCs and single kernel based SVM.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
DeepLoc: prediction of protein subcellular localization using deep learning.
Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole
2017-11-01
The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
LocTree2 predicts localization for all domains of life
Goldberg, Tatyana; Hamp, Tobias; Rost, Burkhard
2012-01-01
Motivation: Subcellular localization is one aspect of protein function. Despite advances in high-throughput imaging, localization maps remain incomplete. Several methods accurately predict localization, but many challenges remain to be tackled. Results: In this study, we introduced a framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data. Availability: Online through PredictProtein (predictprotein.org); as standalone version at http://www.rostlab.org/services/loctree2. Contact: localization@rostlab.org Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22962467
Mei, Suyu
2012-10-07
Recent years have witnessed much progress in computational modeling for protein subcellular localization. However, there are far few computational models for predicting plant protein subcellular multi-localization. In this paper, we propose a multi-label multi-kernel transfer learning model for predicting multiple subcellular locations of plant proteins (MLMK-TLM). The method proposes a multi-label confusion matrix and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which we further extend our published work MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for plant protein subcellular multi-localization. By proper homolog knowledge transfer, MLMK-TLM is applicable to novel plant protein subcellular localization in multi-label learning scenario. The experiments on plant protein benchmark dataset show that MLMK-TLM outperforms the baseline model. Unlike the existing models, MLMK-TLM also reports its misleading tendency, which is important for comprehensive survey of model's multi-labeling performance. Copyright © 2012 Elsevier Ltd. All rights reserved.
Protein (multi-)location prediction: utilizing interdependencies via a generative model.
Simha, Ramanuja; Briesemeister, Sebastian; Kohlbacher, Oliver; Shatkay, Hagit
2015-06-15
Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. We introduce a probabilistic generative model for protein localization, and develop a system based on it-which we call MDLoc-that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems. MDLoc is available at: http://www.eecis.udel.edu/∼compbio/mdloc. © The Author 2015. Published by Oxford University Press.
SChloro: directing Viridiplantae proteins to six chloroplastic sub-compartments.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita
2017-02-01
Chloroplasts are organelles found in plants and involved in several important cell processes. Similarly to other compartments in the cell, chloroplasts have an internal structure comprising several sub-compartments, where different proteins are targeted to perform their functions. Given the relation between protein function and localization, the availability of effective computational tools to predict protein sub-organelle localizations is crucial for large-scale functional studies. In this paper we present SChloro, a novel machine-learning approach to predict protein sub-chloroplastic localization, based on targeting signal detection and membrane protein information. The proposed approach performs multi-label predictions discriminating six chloroplastic sub-compartments that include inner membrane, outer membrane, stroma, thylakoid lumen, plastoglobule and thylakoid membrane. In comparative benchmarks, the proposed method outperforms current state-of-the-art methods in both single- and multi-compartment predictions, with an overall multi-label accuracy of 74%. The results demonstrate the relevance of the approach that is eligible as a good candidate for integration into more general large-scale annotation pipelines of protein subcellular localization. The method is available as web server at http://schloro.biocomp.unibo.it gigi@biocomp.unibo.it.
Protein (multi-)location prediction: utilizing interdependencies via a generative model
Shatkay, Hagit
2015-01-01
Motivation: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein’s function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. Results: We introduce a probabilistic generative model for protein localization, and develop a system based on it—which we call MDLoc—that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems. Availability and implementation: MDLoc is available at: http://www.eecis.udel.edu/∼compbio/mdloc. Contact: shatkay@udel.edu. PMID:26072505
BUSCA: an integrative web server to predict subcellular localization of proteins.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Profiti, Giuseppe; Casadio, Rita
2018-04-30
Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.
Salvatore, M; Shu, N; Elofsson, A
2018-01-01
SubCons is a recently developed method that predicts the subcellular localization of a protein. It combines predictions from four predictors using a Random Forest classifier. Here, we present the user-friendly web-interface implementation of SubCons. Starting from a protein sequence, the server rapidly predicts the subcellular localizations of an individual protein. In addition, the server accepts the submission of sets of proteins either by uploading the files or programmatically by using command line WSDL API scripts. This makes SubCons ideal for proteome wide analyses allowing the user to scan a whole proteome in few days. From the web page, it is also possible to download precalculated predictions for several eukaryotic organisms. To evaluate the performance of SubCons we present a benchmark of LocTree3 and SubCons using two recent mass-spectrometry based datasets of mouse and drosophila proteins. The server is available at http://subcons.bioinfo.se/. © 2017 The Protein Society.
Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2016-02-24
Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers' convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Zahiri, Javad; Mohammad-Noori, Morteza; Ebrahimpour, Reza; Saadat, Samaneh; Bozorgmehr, Joseph H; Goldberg, Tatyana; Masoudi-Nejad, Ali
2014-12-01
Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems. Copyright © 2014 Elsevier Inc. All rights reserved.
Zhou, Jiyun; Wang, Hongpeng; Zhao, Zhishan; Xu, Ruifeng; Lu, Qin
2018-05-08
Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
Laine, Elodie; Carbone, Alessandra
2015-01-01
Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Local backbone structure prediction of proteins
De Brevern, Alexandre G.; Benros, Cristina; Gautier, Romain; Valadié, Hélène; Hazout, Serge; Etchebest, Catherine
2004-01-01
Summary A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (φ, Ψ) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Characterization of essential proteins based on network topology in proteins interaction networks
NASA Astrophysics Data System (ADS)
Bakar, Sakhinah Abu; Taheri, Javid; Zomaya, Albert Y.
2014-06-01
The identification of essential proteins is theoretically and practically important as (1) it is essential to understand the minimal surviving requirements for cellular lives, and (2) it provides fundamental for development of drug. As conducting experimental studies to identify essential proteins are both time and resource consuming, here we present a computational approach in predicting them based on network topology properties from protein-protein interaction networks of Saccharomyces cerevisiae. The proposed method, namely EP3NN (Essential Proteins Prediction using Probabilistic Neural Network) employed a machine learning algorithm called Probabilistic Neural Network as a classifier to identify essential proteins of the organism of interest; it uses degree centrality, closeness centrality, local assortativity and local clustering coefficient of each protein in the network for such predictions. Results show that EP3NN managed to successfully predict essential proteins with an accuracy of 95% for our studied organism. Results also show that most of the essential proteins are close to other proteins, have assortativity behavior and form clusters/sub-graph in the network.
Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas
2006-03-09
The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-03-15
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
Zhang, Li; Liao, Bo; Li, Dachao; Zhu, Wen
2009-07-21
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.
Functional classification of protein structures by local structure matching in graph representation.
Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo
2018-03-31
As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2016-12-01
Apoptosis, or programed cell death, plays a central role in the development and homeostasis of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful for understanding the apoptosis mechanism. The prediction of subcellular localization of an apoptosis protein is still a challenging task, and existing methods mainly based on protein primary sequences. In this paper, we introduce a new position-specific scoring matrix (PSSM)-based method by using detrended cross-correlation (DCCA) coefficient of non-overlapping windows. Then a 190-dimensional (190D) feature vector is constructed on two widely used datasets: CL317 and ZD98, and support vector machine is adopted as classifier. To evaluate the proposed method, objective and rigorous jackknife cross-validation tests are performed on the two datasets. The results show that our approach offers a novel and reliable PSSM-based tool for prediction of apoptosis protein subcellular localization. Copyright © 2016 Elsevier Inc. All rights reserved.
A protein-dependent side-chain rotamer library.
Bhuyan, Md Shariful Islam; Gao, Xin
2011-12-14
Protein side-chain packing problem has remained one of the key open problems in bioinformatics. The three main components of protein side-chain prediction methods are a rotamer library, an energy function and a search algorithm. Rotamer libraries summarize the existing knowledge of the experimentally determined structures quantitatively. Depending on how much contextual information is encoded, there are backbone-independent rotamer libraries and backbone-dependent rotamer libraries. Backbone-independent libraries only encode sequential information, whereas backbone-dependent libraries encode both sequential and locally structural information. However, side-chain conformations are determined by spatially local information, rather than sequentially local information. Since in the side-chain prediction problem, the backbone structure is given, spatially local information should ideally be encoded into the rotamer libraries. In this paper, we propose a new type of backbone-dependent rotamer library, which encodes structural information of all the spatially neighboring residues. We call it protein-dependent rotamer libraries. Given any rotamer library and a protein backbone structure, we first model the protein structure as a Markov random field. Then the marginal distributions are estimated by the inference algorithms, without doing global optimization or search. The rotamers from the given library are then re-ranked and associated with the updated probabilities. Experimental results demonstrate that the proposed protein-dependent libraries significantly outperform the widely used backbone-dependent libraries in terms of the side-chain prediction accuracy and the rotamer ranking ability. Furthermore, without global optimization/search, the side-chain prediction power of the protein-dependent library is still comparable to the global-search-based side-chain prediction methods.
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity
Lee, Hui Sun; Im, Wonpil
2013-01-01
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
NASA Astrophysics Data System (ADS)
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2017-02-01
Apoptosis is a fundamental process controlling normal tissue homeostasis by regulating a balance between cell proliferation and death. Predicting subcellular location of apoptosis proteins is very helpful for understanding its mechanism of programmed cell death. Prediction of apoptosis protein subcellular location is still a challenging and complicated task, and existing methods mainly based on protein primary sequences. In this paper, we propose a new position-specific scoring matrix (PSSM)-based model by using Geary autocorrelation function and detrended cross-correlation coefficient (DCCA coefficient). Then a 270-dimensional (270D) feature vector is constructed on three widely used datasets: ZD98, ZW225 and CL317, and support vector machine is adopted as classifier. The overall prediction accuracies are significantly improved by rigorous jackknife test. The results show that our model offers a reliable and effective PSSM-based tool for prediction of apoptosis protein subcellular localization.
Knowledge-based prediction of protein backbone conformation using a structural alphabet.
Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard
2017-01-01
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Yu, Nancy Y; Wagner, James R; Laird, Matthew R; Melli, Gabor; Rey, Sébastien; Lo, Raymond; Dao, Phuong; Sahinalp, S Cenk; Ester, Martin; Foster, Leonard J; Brinkman, Fiona S L
2010-07-01
PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program. We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions. http://www.psort.org/psortb (download open source software or use the web interface). psort-mail@sfu.ca Supplementary data are available at Bioinformatics online.
Saini, Harsh; Raicar, Gaurav; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok
2015-12-07
Protein subcellular localization is an important topic in proteomics since it is related to a protein׳s overall function, helps in the understanding of metabolic pathways, and in drug design and discovery. In this paper, a basic approximation technique from natural language processing called the linear interpolation smoothing model is applied for predicting protein subcellular localizations. The proposed approach extracts features from syntactical information in protein sequences to build probabilistic profiles using dependency models, which are used in linear interpolation to determine how likely is a sequence to belong to a particular subcellular location. This technique builds a statistical model based on maximum likelihood. It is able to deal effectively with high dimensionality that hinders other traditional classifiers such as Support Vector Machines or k-Nearest Neighbours without sacrificing performance. This approach has been evaluated by predicting subcellular localizations of Gram positive and Gram negative bacterial proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.
2014-01-01
Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
2014-04-15
Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
Khan, Abdul Arif
2014-06-01
The potential role of Escherichia coli in the development of colorectal carcinoma (CRC) has been investigated in many studies. Although the exact mechanism is not clear, chronic inflammation caused by E. coli and other related events are suggested as possible causes behind E. coli-induced colon cancer. It has been found that CRC cells, but not normal cells, are colonized by an intracellular form of E. coli. We predicted nuclear targeting of bacterial proteins in the host cell through computational tools nuclear localization signal (NLS) mapper and balanced subcellular localization predictor (BaCeILo). During intracellular E. coli residence, such targeting is highly likely and may have a possible role in colon cancer etiology. We observed that several gene expression-associated proteins of E. coli can migrate to the host nucleus during intracellular infections. This situation provides an opportunity for competitive interaction of host and pathogen proteins with similar cellular substrates, thereby increasing the chances of development of colon cancer. Moreover, the results indicated that proteins localized in the membrane of E. coli mostly act as secretary proteins in host cells. No exact correlation was observed between NLS prediction and nuclear localization prediction by BaCeILo. This is partly because of a number of reasons, including that only 30% of nuclear proteins carry NLS and that proteins <40 kDa molecular weight can passively target the host nucleus. This study concludes that detection of gene expression-specific E. coli proteins and their targeting of the nucleus may have a profound impact on CRC etiology.
Cloud prediction of protein structure and function with PredictProtein for Debian.
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Cloud Prediction of Protein Structure and Function with PredictProtein for Debian
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032
MITOPRED: a web server for the prediction of mitochondrial proteins
Guda, Chittibabu; Guda, Purnima; Fahy, Eoin; Subramaniam, Shankar
2004-01-01
MITOPRED web server enables prediction of nucleus-encoded mitochondrial proteins in all eukaryotic species. Predictions are made using a new algorithm based primarily on Pfam domain occurrence patterns in mitochondrial and non-mitochondrial locations. Pre-calculated predictions are instantly accessible for proteomes of Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila, Homo sapiens, Mus musculus and Arabidopsis species as well as all the eukaryotic sequences in the Swiss-Prot and TrEMBL databases. Queries, at different confidence levels, can be made through four distinct options: (i) entering Swiss-Prot/TrEMBL accession numbers; (ii) uploading a local file with such accession numbers; (iii) entering protein sequences; (iv) uploading a local file containing protein sequences in FASTA format. Automated updates are scheduled for the pre-calculated prediction database so as to provide access to the most current data. The server, its documentation and the data are available from http://mitopred.sdsc.edu. PMID:15215413
MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins
Li, Hui; Wang, Rong; Gan, Yong
2017-01-01
Apoptosis proteins play an important role in the mechanism of programmed cell death. Predicting subcellular localization of apoptosis proteins is an essential step to understand their functions and identify drugs target. Many computational prediction methods have been developed for apoptosis protein subcellular localization. However, these existing works only focus on the proteins that have one location; proteins with multiple locations are either not considered or assumed as not existing when constructing prediction models, so that they cannot completely predict all the locations of the apoptosis proteins with multiple locations. To address this problem, this paper proposes a novel multilabel predictor named MultiP-Apo, which can predict not only apoptosis proteins with single subcellular location but also those with multiple subcellular locations. Specifically, given a query protein, GO-based feature extraction method is used to extract its feature vector. Subsequently, the GO feature vector is classified by a new multilabel classifier based on the label-specific features. It is the first multilabel predictor ever established for identifying subcellular locations of multilocation apoptosis proteins. As an initial study, MultiP-Apo achieves an overall accuracy of 58.49% by jackknife test, which indicates that our proposed predictor may become a very useful high-throughput tool in this area. PMID:28744305
MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins.
Wang, Xiao; Li, Hui; Wang, Rong; Zhang, Qiuwen; Zhang, Weiwei; Gan, Yong
2017-01-01
Apoptosis proteins play an important role in the mechanism of programmed cell death. Predicting subcellular localization of apoptosis proteins is an essential step to understand their functions and identify drugs target. Many computational prediction methods have been developed for apoptosis protein subcellular localization. However, these existing works only focus on the proteins that have one location; proteins with multiple locations are either not considered or assumed as not existing when constructing prediction models, so that they cannot completely predict all the locations of the apoptosis proteins with multiple locations. To address this problem, this paper proposes a novel multilabel predictor named MultiP-Apo, which can predict not only apoptosis proteins with single subcellular location but also those with multiple subcellular locations. Specifically, given a query protein, GO-based feature extraction method is used to extract its feature vector. Subsequently, the GO feature vector is classified by a new multilabel classifier based on the label-specific features. It is the first multilabel predictor ever established for identifying subcellular locations of multilocation apoptosis proteins. As an initial study, MultiP-Apo achieves an overall accuracy of 58.49% by jackknife test, which indicates that our proposed predictor may become a very useful high-throughput tool in this area.
Shi, Ruijia; Xu, Cunshuan
2011-06-01
The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2014-01-01
Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
Camproux, A C; Tufféry, P
2005-08-05
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2016-12-02
In the postgenomic era, the number of unreviewed protein sequences is remarkably larger and grows tremendously faster than that of reviewed ones. However, existing methods for protein subchloroplast localization often ignore the information from these unlabeled proteins. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than state-of-the-art predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features. For readers' convenience, the online Web server LNP-Chlo is freely available at http://bioinfo.eie.polyu.edu.hk/LNPChloServer/ .
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
He, Jianjun; Gu, Hong; Liu, Wenqi
2012-01-01
It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.
APOLLO: a quality assessment service for single and multiple protein models.
Wang, Zheng; Eickholt, Jesse; Cheng, Jianlin
2011-06-15
We built a web server named APOLLO, which can evaluate the absolute global and local qualities of a single protein model using machine learning methods or the global and local qualities of a pool of models using a pair-wise comparison approach. Based on our evaluations on 107 CASP9 (Critical Assessment of Techniques for Protein Structure Prediction) targets, the predicted quality scores generated from our machine learning and pair-wise methods have an average per-target correlation of 0.671 and 0.917, respectively, with the true model quality scores. Based on our test on 92 CASP9 targets, our predicted absolute local qualities have an average difference of 2.60 Å with the actual distances to native structure. http://sysbio.rnet.missouri.edu/apollo/. Single and pair-wise global quality assessment software is also available at the site.
Discriminative structural approaches for enzyme active-site prediction.
Kato, Tsuyoshi; Nagano, Nozomi
2011-02-15
Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.
Kwasigroch, Jean Marc; Rooman, Marianne
2006-07-15
Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.
Multi-Label Learning via Random Label Selection for Protein Subcellular Multi-Locations Prediction.
Wang, Xiao; Li, Guo-Zheng
2013-03-12
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multi-location proteins to multiple proteins with single location, which doesn't take correlations among different subcellular locations into account. In this paper, a novel method named RALS (multi-label learning via RAndom Label Selection), is proposed to learn from multi-location proteins in an effective and efficient way. Through five-fold cross validation test on a benchmark dataset, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark datasets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multi-locations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-01-01
Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259
Real-time ligand binding pocket database search using local surface descriptors.
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-07-01
Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.
Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang
2008-05-01
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.
Li, Min; Li, Wenkai; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin
2018-06-14
Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC. Copyright © 2018 Elsevier Ltd. All rights reserved.
Qin, Chao; Sun, Yongqi; Dong, Yadong
2017-01-01
Essential proteins are the proteins that are indispensable to the survival and development of an organism. Deleting a single essential protein will cause lethality or infertility. Identifying and analysing essential proteins are key to understanding the molecular mechanisms of living cells. There are two types of methods for predicting essential proteins: experimental methods, which require considerable time and resources, and computational methods, which overcome the shortcomings of experimental methods. However, the prediction accuracy of computational methods for essential proteins requires further improvement. In this paper, we propose a new computational strategy named CoTB for identifying essential proteins based on a combination of topological properties, subcellular localization information and orthologous protein information. First, we introduce several topological properties of the protein-protein interaction (PPI) network. Second, we propose new methods for measuring orthologous information and subcellular localization and a new computational strategy that uses a random forest prediction model to obtain a probability score for the proteins being essential. Finally, we conduct experiments on four different Saccharomyces cerevisiae datasets. The experimental results demonstrate that our strategy for identifying essential proteins outperforms traditional computational methods and the most recently developed method, SON. In particular, our strategy improves the prediction accuracy to 89, 78, 79, and 85 percent on the YDIP, YMIPS, YMBD and YHQ datasets at the top 100 level, respectively.
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.
Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe
2018-02-19
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Winokur, S T; Shiang, R
1998-11-01
The TCOF1 gene product, treacle, responsible for the craniofacial disorder Treacher Collins syndrome, has been predicted to be a member of a class of nucleolar phosphoproteins based on its primary amino acid sequence. Treacle is a low complexity protein with ten repeating units of acidic and basic residues, each of which contains a large number of putative casein kinase 2 and protein kinase C phosphorylation sites. In addition, the C-terminus of treacle contains multiple putative nuclear localization signals. The overall structure of treacle, as well as sequence similarity to several nucleolar phosphoproteins, predicts that treacle is a member of this class of proteins. Using green fluorescent protein fusion constructs with the full-length and deleted domains of the murine homolog of treacle, we demonstrate that the cellular localization of treacle is nucleolar. This localization is mediated by the last 41 residues of the C-terminus (residues 1262-1302). At least two functional nuclear localization signals have been identified in the protein, one between residues 1176 and 1270 and the second within the last 32 residues of the protein (1271-1302). The nucleolar localization signal is disrupted by two constructs that split the C-terminal region between residues 1270 and 1271. This study provides the first direct analysis of treacle and demonstrates that the protein involved in TCOF1 is a nucleolar protein.
WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms.
Chi, Sang-Mun; Nam, Dougu
2012-04-01
We present an accurate and fast web server, WegoLoc for predicting subcellular localization of proteins based on sequence similarity and weighted Gene Ontology (GO) information. A term weighting method in the text categorization process is applied to GO terms for a support vector machine classifier. As a result, WegoLoc surpasses the state-of-the-art methods for previously used test datasets. WegoLoc supports three eukaryotic kingdoms (animals, fungi and plants) and provides human-specific analysis, and covers several sets of cellular locations. In addition, WegoLoc provides (i) multiple possible localizations of input protein(s) as well as their corresponding probability scores, (ii) weights of GO terms representing the contribution of each GO term in the prediction, and (iii) a BLAST E-value for the best hit with GO terms. If the similarity score does not meet a given threshold, an amino acid composition-based prediction is applied as a backup method. WegoLoc and User's guide are freely available at the website http://www.btool.org/WegoLoc smchiks@ks.ac.kr; dougnam@unist.ac.kr Supplementary data is available at http://www.btool.org/WegoLoc.
Xie, Dan; Li, Ao; Wang, Minghui; Fan, Zhewen; Feng, Huanqing
2005-01-01
Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. In this paper, a novel method named LOCSVMPSI has been introduced, which is based on the support vector machine (SVM) and the position-specific scoring matrix generated from profiles of PSI-BLAST. With a jackknife test on the RH2427 data set, LOCSVMPSI achieved a high overall prediction accuracy of 90.2%, which is higher than the prediction results by SubLoc and ESLpred on this data set. In addition, prediction performance of LOCSVMPSI was evaluated with 5-fold cross validation test on the PK7579 data set and the prediction results were consistently better than the previous method based on several SVMs using composition of both amino acids and amino acid pairs. Further test on the SWISSPROT new-unique data set showed that LOCSVMPSI also performed better than some widely used prediction methods, such as PSORTII, TargetP and LOCnet. All these results indicate that LOCSVMPSI is a powerful tool for the prediction of eukaryotic protein subcellular localization. An online web server (current version is 1.3) based on this method has been developed and is freely available to both academic and commercial users, which can be accessed by at . PMID:15980436
ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea
Paramasivam, Nagarajan; Linke, Dirk
2011-01-01
The subcellular localization (SCL) of proteins provides important clues to their function in a cell. In our efforts to predict useful vaccine targets against Gram-negative bacteria, we noticed that misannotated start codons frequently lead to wrongly assigned SCLs. This and other problems in SCL prediction, such as the relatively high false-positive and false-negative rates of some tools, can be avoided by applying multiple prediction tools to groups of homologous proteins. Here we present ClubSub-P, an online database that combines existing SCL prediction tools into a consensus pipeline from more than 600 proteomes of fully sequenced microorganisms. On top of the consensus prediction at the level of single sequences, the tool uses clusters of homologous proteins from Gram-negative bacteria and from Archaea to eliminate false-positive and false-negative predictions. ClubSub-P can assign the SCL of proteins from Gram-negative bacteria and Archaea with high precision. The database is searchable, and can easily be expanded using either new bacterial genomes or new prediction tools as they become available. This will further improve the performance of the SCL prediction, as well as the detection of misannotated start codons and other annotation errors. ClubSub-P is available online at http://toolkit.tuebingen.mpg.de/clubsubp/ PMID:22073040
Gao, Yujuan; Wang, Sheng; Deng, Minghua; Xu, Jinbo
2018-05-08
Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study.
2014-01-01
Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231
Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin
2014-04-28
It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
PredictProtein—an open resource for online prediction of protein structural and functional features
Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard
2014-01-01
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
Protein docking prediction using predicted protein-protein interface.
Li, Bin; Kihara, Daisuke
2012-01-10
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G
2007-07-27
The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins.
Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G
2007-01-01
Background The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. Results A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. Conclusion This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins. PMID:17662140
Consistent prediction of GO protein localization.
Spetale, Flavio E; Arce, Debora; Krsticevic, Flavia; Bulacio, Pilar; Tapia, Elizabeth
2018-05-17
The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC + , a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC + classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC + classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC + classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.
Compressed learning and its applications to subcellular localization.
Zheng, Zhong-Long; Guo, Li; Jia, Jiong; Xie, Chen-Mao; Zeng, Wen-Cai; Yang, Jie
2011-09-01
One of the main challenges faced by biological applications is to predict protein subcellular localization in automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological systems into account. Traditionally, such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressed Sensing) theory, we propose the methodology which performs compressed learning with a sparseness criterion such that feature selection and dimension reduction are merged into one analysis. The proposed methodology decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse methods are quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.
Semiempirical prediction of protein folds
NASA Astrophysics Data System (ADS)
Fernández, Ariel; Colubri, Andrés; Appignanesi, Gustavo
2001-08-01
We introduce a semiempirical approach to predict ab initio expeditious pathways and native backbone geometries of proteins that fold under in vitro renaturation conditions. The algorithm is engineered to incorporate a discrete codification of local steric hindrances that constrain the movements of the peptide backbone throughout the folding process. Thus, the torsional state of the chain is assumed to be conditioned by the fact that hopping from one basin of attraction to another in the Ramachandran map (local potential energy surface) of each residue is energetically more costly than the search for a specific (Φ, Ψ) torsional state within a single basin. A combinatorial procedure is introduced to evaluate coarsely defined torsional states of the chain defined ``modulo basins'' and translate them into meaningful patterns of long range interactions. Thus, an algorithm for structure prediction is designed based on the fact that local contributions to the potential energy may be subsumed into time-evolving conformational constraints defining sets of restricted backbone geometries whereupon the patterns of nonbonded interactions are constructed. The predictive power of the algorithm is assessed by (a) computing ab initio folding pathways for mammalian ubiquitin that ultimately yield a stable structural pattern reproducing all of its native features, (b) determining the nucleating event that triggers the hydrophobic collapse of the chain, and (c) comparing coarse predictions of the stable folds of moderately large proteins (N~100) with structural information extracted from the protein data bank.
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Wang, Xiao; Zhang, Jun; Li, Guo-Zheng
2015-01-01
It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.
Zhou, Hang; Yang, Yang; Shen, Hong-Bin
2017-03-15
Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
LOCATE: a mouse protein subcellular localization database
Fink, J. Lynn; Aturaliya, Rajith N.; Davis, Melissa J.; Zhang, Fasheng; Hanson, Kelly; Teasdale, Melvena S.; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Teasdale, Rohan D.
2006-01-01
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for ∼40% of the mouse proteome. It is available at . PMID:16381849
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
SELF-BLM: Prediction of drug-target interactions via self-training SVM.
Keum, Jongsoo; Nam, Hojung
2017-01-01
Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM.
Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J
2015-06-07
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multilabel learning via random label selection for protein subcellular multilocations prediction.
Wang, Xiao; Li, Guo-Zheng
2013-01-01
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
A Survey of Computational Intelligence Techniques in Protein Function Prediction
Tiwari, Arvind Kumar; Srivastava, Rajeev
2014-01-01
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395
Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei
2016-01-01
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851
Verma, Ruchi; Varshney, Grish C; Raghava, G P S
2010-06-01
The rate of human death due to malaria is increasing day-by-day. Thus the malaria causing parasite Plasmodium falciparum (PF) remains the cause of concern. With the wealth of data now available, it is imperative to understand protein localization in order to gain deeper insight into their functional roles. In this manuscript, an attempt has been made to develop prediction method for the localization of mitochondrial proteins. In this study, we describe a method for predicting mitochondrial proteins of malaria parasite using machine-learning technique. All models were trained and tested on 175 proteins (40 mitochondrial and 135 non-mitochondrial proteins) and evaluated using five-fold cross validation. We developed a Support Vector Machine (SVM) model for predicting mitochondrial proteins of P. falciparum, using amino acids and dipeptides composition and achieved maximum MCC 0.38 and 0.51, respectively. In this study, split amino acid composition (SAAC) is used where composition of N-termini, C-termini, and rest of protein is computed separately. The performance of SVM model improved significantly from MCC 0.38 to 0.73 when SAAC instead of simple amino acid composition was used as input. In addition, SVM model has been developed using composition of PSSM profile with MCC 0.75 and accuracy 91.38%. We achieved maximum MCC 0.81 with accuracy 92% using a hybrid model, which combines PSSM profile and SAAC. When evaluated on an independent dataset our method performs better than existing methods. A web server PFMpred has been developed for predicting mitochondrial proteins of malaria parasites ( http://www.imtech.res.in/raghava/pfmpred/).
Zou, Lingyun; Wang, Zhengzhi; Huang, Jiaomin
2007-12-01
Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific Iterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and 1st-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.
Global, quantitative and dynamic mapping of protein subcellular localization.
Itzhak, Daniel N; Tyanova, Stefka; Cox, Jürgen; Borner, Georg Hh
2016-06-09
Subcellular localization critically influences protein function, and cells control protein localization to regulate biological processes. We have developed and applied Dynamic Organellar Maps, a proteomic method that allows global mapping of protein translocation events. We initially used maps statically to generate a database with localization and absolute copy number information for over 8700 proteins from HeLa cells, approaching comprehensive coverage. All major organelles were resolved, with exceptional prediction accuracy (estimated at >92%). Combining spatial and abundance information yielded an unprecedented quantitative view of HeLa cell anatomy and organellar composition, at the protein level. We subsequently demonstrated the dynamic capabilities of the approach by capturing translocation events following EGF stimulation, which we integrated into a quantitative model. Dynamic Organellar Maps enable the proteome-wide analysis of physiological protein movements, without requiring any reagents specific to the investigated process, and will thus be widely applicable in cell biology.
Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen
2018-05-01
For in-depth understanding the functions of proteins in a cell, the knowledge of their subcellular localization is indispensable. The current study is focused on human protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called 'pLoc-mHum' by extracting the crucial GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a same stringent benchmark dataset have indicated that the proposed pLoc-mHum predictor is remarkably superior to iLoc-Hum, the state-of-the-art method in predicting the human protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mHum/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. xcheng@gordonlifescience.org. Supplementary data are available at Bioinformatics online.
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen
2017-10-06
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Copyright © 2017 Elsevier Inc. All rights reserved.
Prediction of protein subcellular localization by weighted gene ontology terms.
Chi, Sang-Mun
2010-08-27
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Höglund IDS, fungal Höglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations. Copyright 2010 Elsevier Inc. All rights reserved.
Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0
Zhu, Xiaolei; Xiong, Yi; Kihara, Daisuke
2015-01-01
Motivation: Ligand binding is a key aspect of the function of many proteins. Thus, binding ligand prediction provides important insight in understanding the biological function of proteins. Binding ligand prediction is also useful for drug design and examining potential drug side effects. Results: We present a computational method named Patch-Surfer2.0, which predicts binding ligands for a protein pocket. By representing and comparing pockets at the level of small local surface patches that characterize physicochemical properties of the local regions, the method can identify binding pockets of the same ligand even if they do not share globally similar shapes. Properties of local patches are represented by an efficient mathematical representation, 3D Zernike Descriptor. Patch-Surfer2.0 has significant technical improvements over our previous prototype, which includes a new feature that captures approximate patch position with a geodesic distance histogram. Moreover, we constructed a large comprehensive database of ligand binding pockets that will be searched against by a query. The benchmark shows better performance of Patch-Surfer2.0 over existing methods. Availability and implementation: http://kiharalab.org/patchsurfer2.0/ Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25359888
Sael, Lee; Kihara, Daisuke
2012-01-01
Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations. PMID:22275074
Sael, Lee; Kihara, Daisuke
2012-04-01
Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations. Copyright © 2011 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woolston, Caroline M.; Al-Attar, Ahmad; Storr, Sarah J.
2011-04-01
Purpose: Early-stage invasive breast cancer patients have commonly undergone breast-conserving surgery and radiotherapy. In a large majority of these patients, the treatment is effective; however, a proportion will develop local recurrence. Deregulated redox systems provide cancer cells protection from increased oxidative stress, such as that induced by ionizing radiation. Therefore, the expression of redox proteins was examined in tumor specimens from this defined cohort to determine whether such expression could predict response. Methods and Materials: The nuclear and cytoplasmic expression of nine redox proteins (glutathione, glutathione reductase, glutaredoxin, glutathione peroxidase 1, 3, and 4, and glutathione S-transferase-{theta}, -{pi}, and -{alpha})more » was assessed using conventional immunohistochemistry on a tissue microarray of 224 tumors. Results: A high cytoplasmic expression of glutathione S-transferase-{theta} significantly correlated with a greater risk of local recurrence (p = .008) and, when combined with a low nuclear expression (p = .009), became an independent predictive factor (p = .002) for local recurrence. High cytoplasmic expression of glutathione S-transferase-{theta} also correlated with a worse overall survival (p = .009). Low nuclear and cytoplasmic expression of glutathione peroxidase 3 (p = .002) correlated with a greater risk of local recurrence and was an independent predictive factor (p = .005). These proteins did not correlate with tumor grade, suggesting their function might be specific to the regulation of oxidative stress rather than alterations of tumor phenotype. Only nuclear (p = .005) and cytoplasmic (p = .001) expression of glutathione peroxidase 4 correlated with the tumor grade. Conclusions: Our results support the use of redox protein expression, namely glutathione S-transferase-{theta} and glutathione peroxidase 3, to predict the response to radiotherapy in early-stage breast cancer patients. If incorporated into routine diagnostic tests, they have the potential to aid clinicians in their stratification of patients into more tailored treatment regimens. Future targeted therapies to these systems might improve the efficacy of reactive oxygen species-inducing therapies, such as radiotherapy.« less
Predicting protein interactions by Brownian dynamics simulations.
Meng, Xuan-Yu; Xu, Yu; Zhang, Hong-Xing; Mezei, Mihaly; Cui, Meng
2012-01-01
We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.
Stekhoven, Daniel J; Omasits, Ulrich; Quebatte, Maxime; Dehio, Christoph; Ahrens, Christian H
2014-03-17
Proteomics data provide unique insights into biological systems, including the predominant subcellular localization (SCL) of proteins, which can reveal important clues about their functions. Here we analyzed data of a complete prokaryotic proteome expressed under two conditions mimicking interaction of the emerging pathogen Bartonella henselae with its mammalian host. Normalized spectral count data from cytoplasmic, total membrane, inner and outer membrane fractions allowed us to identify the predominant SCL for 82% of the identified proteins. The spectral count proportion of total membrane versus cytoplasmic fractions indicated the propensity of cytoplasmic proteins to co-fractionate with the inner membrane, and enabled us to distinguish cytoplasmic, peripheral inner membrane and bona fide inner membrane proteins. Principal component analysis and k-nearest neighbor classification training on selected marker proteins or predominantly localized proteins, allowed us to determine an extensive catalog of at least 74 expressed outer membrane proteins, and to extend the SCL assignment to 94% of the identified proteins, including 18% where in silico methods gave no prediction. Suitable experimental proteomics data combined with straightforward computational approaches can thus identify the predominant SCL on a proteome-wide scale. Finally, we present a conceptual approach to identify proteins potentially changing their SCL in a condition-dependent fashion. The work presented here describes the first prokaryotic proteome-wide subcellular localization (SCL) dataset for the emerging pathogen B. henselae (Bhen). The study indicates that suitable subcellular fractionation experiments combined with straight-forward computational analysis approaches assessing the proportion of spectral counts observed in different subcellular fractions are powerful for determining the predominant SCL of a large percentage of the experimentally observed proteins. This includes numerous cases where in silico prediction methods do not provide any prediction. Avoiding a treatment with harsh conditions, cytoplasmic proteins tend to co-fractionate with proteins of the inner membrane fraction, indicative of close functional interactions. The spectral count proportion (SCP) of total membrane versus cytoplasmic fractions allowed us to obtain a good indication about the relative proximity of individual protein complex members to the inner membrane. Using principal component analysis and k-nearest neighbor approaches, we were able to extend the percentage of proteins with a predominant experimental localization to over 90% of all expressed proteins and identified a set of at least 74 outer membrane (OM) proteins. In general, OM proteins represent a rich source of candidates for the development of urgently needed new therapeutics in combat of resurgence of infectious disease and multi-drug resistant bacteria. Finally, by comparing the data from two infection biology relevant conditions, we conceptually explore methods to identify and visualize potential candidates that may partially change their SCL in these different conditions. The data are made available to researchers as a SCL compendium for Bhen and as an assistance in further improving in silico SCL prediction algorithms. Copyright © 2014 Elsevier B.V. All rights reserved.
2016-01-01
Abstract Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G‐LoSA. G‐LoSA aligns protein local structures in a sequence order independent way and provides a GA‐score, a chemical feature‐based and size‐independent structure similarity score. Our benchmark validation shows the robust performance of G‐LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure‐centric comparative biology studies. In particular, G‐LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G‐LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer‐aided drug design. We hope that G‐LoSA can be a useful computational method for exploring interesting biological problems through large‐scale comparison of protein local structures and facilitating drug discovery research and development. G‐LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. PMID:26813336
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
Yang, Fan; Xu, Ying-Ying; Shen, Hong-Bin
2014-01-01
Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
Localized mRNA translation and protein association
NASA Astrophysics Data System (ADS)
Zhdanov, Vladimir P.
2014-08-01
Recent direct observations of localization of mRNAs and proteins both in prokaryotic and eukaryotic cells can be related to slowdown of diffusion of these species due to macromolecular crowding and their ability to aggregate and form immobile or slowly mobile complexes. Here, a generic kinetic model describing both these factors is presented and comprehensively analyzed. Although the model is non-linear, an accurate self-consistent analytical solution of the corresponding reaction-diffusion equation has been constructed, the types of localized protein distributions have been explicitly shown, and the predicted kinetic regimes of gene expression have been classified.
Protein Structure Prediction by Protein Threading
NASA Astrophysics Data System (ADS)
Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong
The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.
KFC Server: interactive forecasting of protein interaction hot spots.
Darnell, Steven J; LeGault, Laura; Mitchell, Julie C
2008-07-01
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model-a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein-protein or protein-DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org.
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.
Daberdaku, Sebastian; Ferrari, Carlo
2018-02-06
The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.
Global, quantitative and dynamic mapping of protein subcellular localization
Itzhak, Daniel N; Tyanova, Stefka; Cox, Jürgen; Borner, Georg HH
2016-01-01
Subcellular localization critically influences protein function, and cells control protein localization to regulate biological processes. We have developed and applied Dynamic Organellar Maps, a proteomic method that allows global mapping of protein translocation events. We initially used maps statically to generate a database with localization and absolute copy number information for over 8700 proteins from HeLa cells, approaching comprehensive coverage. All major organelles were resolved, with exceptional prediction accuracy (estimated at >92%). Combining spatial and abundance information yielded an unprecedented quantitative view of HeLa cell anatomy and organellar composition, at the protein level. We subsequently demonstrated the dynamic capabilities of the approach by capturing translocation events following EGF stimulation, which we integrated into a quantitative model. Dynamic Organellar Maps enable the proteome-wide analysis of physiological protein movements, without requiring any reagents specific to the investigated process, and will thus be widely applicable in cell biology. DOI: http://dx.doi.org/10.7554/eLife.16950.001 PMID:27278775
Isegawa, Yuji; Miyamoto, Yoichi; Yasuda, Yoshinari; Semi, Katsunori; Tsujimura, Kenji; Fukunaga, Rikiro; Ohshima, Atsushi; Horiguchi, Yasuhiro; Yoneda, Yoshihiro; Sugimoto, Nakaba
2008-01-01
To elucidate the function of the U69 protein kinase of human herpesvirus 6 (HHV-6) in vivo, we first analyzed its subcellular localization in HHV-6-infected Molt 3 cells by using polyclonal antibodies against the U69 protein. Immunofluorescence studies showed that the U69 signal localized to the nucleus in a mesh-like pattern in both HHV-6-infected and HHV6-transfected cells. A computer program predicted two overlapping classic nuclear localization signals (NLSs) in the N-terminal region of the protein; this NLS motif is highly conserved in the N-terminal region of most of the herpesvirus protein kinases examined to date. An N-terminal deletion mutant form of the protein failed to enter the nucleus, whereas a fusion protein of green fluorescent protein (GFP) and/or glutathione S-transferase (GST) and the U69 N-terminal region was transported into the nucleus, demonstrating that the predicted N-terminal NLSs of the protein actually function as NLSs. The nuclear transport of the GST-GFP fusion protein containing the N-terminal NLS of U69 was inhibited by wheat germ agglutinin and by the Q69L Ran-GTP mutant, indicating that the U69 protein is transported into the nucleus from the cytoplasm via classic nuclear transport machinery. A cell-free import assay showed that the nuclear transport of the U69 protein was mediated by importin α/β in conjunction with the small GTPase Ran. When the import assay was performed with a low concentration of each importin-α subtype, NPI2/importin-α7 elicited more efficient transport activity than did Rch1/importin-α1 or Qip1/importin-α3. These results suggest a relationship between the localization of NPI2/importin-α7 and the cell tropism of HHV-6. PMID:18003734
Guo, Kang-kang; Tang, Qing-hai; Zhang, Yan-ming; Kang, Kai; He, Lei
2011-05-18
The membrane topology and molecular mechanisms for endoplasmic reticulum (ER) localization of classical swine fever virus (CSFV) non-structural 2 (NS2) protien is unclear. We attempted to elucidate the subcellular localization, and the molecular mechanisms responsible for the localization of this protein in our study. The NS2 gene was amplified by reverse transcription polymerase chain reaction, with the transmembrane region and hydrophilicity of the NS2 protein was predicted by bioinformatics analysis. Twelve cDNAs of the NS2 gene were amplified by the PCR deletion method and cloned into a eukaryotic expression vector, which was transfected into a swine umbilical vein endothelial cell line (SUVEC). Subcellular localization of the NS2 protein was characterized by confocal microscopy, and western blots were carried out to analyze protein expression. Our results showed that the -NH2 terminal of the CSFV NS2 protein was highly hydrophobic and the protein localized in the ER. At least four transmembrane regions and two internal signal peptide sequences (amino acids103-138 and 220-262) were identified and thought to be critical for its trans-localization to the ER. This is the first study to identify the internal signal peptide sequences of the CSFV NS2 protein and its subcellular localization, providing the foundation for further exploration of this protein's function of this protein and its role in CSFV pathogenesis.
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.
Liu, Zhi-Ping; Chen, Luonan
2016-01-01
Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.
SIFTER search: a web server for accurate phylogeny-based protein function prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
2015-05-15
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
Sirius PSB: a generic system for analysis of biological sequences.
Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon
2009-12-01
Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.
Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0.
Zhu, Xiaolei; Xiong, Yi; Kihara, Daisuke
2015-03-01
Ligand binding is a key aspect of the function of many proteins. Thus, binding ligand prediction provides important insight in understanding the biological function of proteins. Binding ligand prediction is also useful for drug design and examining potential drug side effects. We present a computational method named Patch-Surfer2.0, which predicts binding ligands for a protein pocket. By representing and comparing pockets at the level of small local surface patches that characterize physicochemical properties of the local regions, the method can identify binding pockets of the same ligand even if they do not share globally similar shapes. Properties of local patches are represented by an efficient mathematical representation, 3D Zernike Descriptor. Patch-Surfer2.0 has significant technical improvements over our previous prototype, which includes a new feature that captures approximate patch position with a geodesic distance histogram. Moreover, we constructed a large comprehensive database of ligand binding pockets that will be searched against by a query. The benchmark shows better performance of Patch-Surfer2.0 over existing methods. http://kiharalab.org/patchsurfer2.0/ CONTACT: dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Heffernan, Rhys; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi
2017-09-15
The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10-20 amino acid residues to capture some 'short to intermediate' non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window. We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| >19) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone ϕ , ψ , θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted C α atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by ϕ and ψ angles. We expect the method to be useful for assisting protein structure and function prediction. The method is available as a SPIDER3 server and standalone package at http://sparks-lab.org . yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok
2017-03-01
Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Mode localization in the cooperative dynamics of protein recognition
NASA Astrophysics Data System (ADS)
Copperman, J.; Guenza, M. G.
2016-07-01
The biological function of proteins is encoded in their structure and expressed through the mediation of their dynamics. This paper presents a study on the correlation between local fluctuations, binding, and biological function for two sample proteins, starting from the Langevin Equation for Protein Dynamics (LE4PD). The LE4PD is a microscopic and residue-specific coarse-grained approach to protein dynamics, which starts from the static structural ensemble of a protein and predicts the dynamics analytically. It has been shown to be accurate in its prediction of NMR relaxation experiments and Debye-Waller factors. The LE4PD is solved in a set of diffusive modes which span a vast range of time scales of the protein dynamics, and provides a detailed picture of the mode-dependent localization of the fluctuation as a function of the primary structure of the protein. To investigate the dynamics of protein complexes, the theory is implemented here to treat the coarse-grained dynamics of interacting macromolecules. As an example, calculations of the dynamics of monomeric and dimerized HIV protease and the free Insulin Growth Factor II Receptor (IGF2R) domain 11 and its IGF2R:IGF2 complex are presented. Either simulation-derived or experimentally measured NMR conformers are used as input structural ensembles to the theory. The picture that emerges suggests a dynamical heterogeneous protein where biologically active regions provide energetically comparable conformational states that are trapped by a reacting partner in agreement with the conformation-selection mechanism of binding.
Distinct Pathways Mediate the Sorting of Tail-anchored Mitochondrial Outer Membrane Proteins
USDA-ARS?s Scientific Manuscript database
Little is known about the biogenesis of tail-anchored (TA) proteins localized to the mitochondrial outer membrane in plant cells. To address this issue, we screened all of the (>500) known and predicted TA proteins in Arabidopsis for those annotated, based on Gene Ontology, to possess mitochondrial...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cashman, Derek J.; Zhu, Tuo; Simmerman, Richard F.
2014-08-01
The stromal domain (PsaC, PsaD, and PsaE) of photosystem I (PSI) reduces transiently bound ferredoxin (Fd) or flavodoxin. Experimental structures exist for all of these protein partners individually, but no experimental structure of the PSI/Fd or PSI/flavodoxin complexes is presently available. Molecular models of Fd docked onto the stromal domain of the cyanobacterial PSI site are constructed here utilizing X-ray and NMR structures of PSI and Fd, respectively. Moreover, predictions of potential protein-protein interaction regions are based on experimental site-directed mutagenesis and cross-linking studies to guide rigid body docking calculations of Fd into PSI, complemented by energy landscape theory tomore » bring together regions of high energetic frustration on each of the interacting proteins. Results identify two regions of high localized frustration on the surface of Fd that contain negatively charged Asp and Glu residues. Our study predicts that these regions interact predominantly with regions of high localized frustration on the PsaC, PsaD, and PsaE chains of PSI, which include several residues predicted by previous experimental studies.« less
A generative, probabilistic model of local protein structure.
Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas
2008-07-01
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
fRMSDPred: Predicting Local RMSD Between Structural Fragments Using Sequence Information
2007-04-04
machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel
Babar, Prasad H; Dey, Vishakha; Jaiswar, Praveen; Patankar, Swati
Many Plasmodium falciparum proteins do not share homology with, and are generally longer than their respective orthologs. This, to some extent, can be attributed to insertions. Here, we studied a P. falciparum RNA hypermethylase, trimethylguanosine synthase (PfTGS1) that harbors a 76 amino acid insertion in its methyltransferase domain. Bioinformatics analysis revealed that this insertion was present in TGS1 orthologs from other Plasmodium species as well. Interestingly, a classical nuclear localization signal (NLS) was predicted in the insertions of primate parasite TGS1 proteins. To check whether these predicted NLS are functional, we developed an in vivo heterologous system using S. cerevisiae. The predicted NLS when fused to dimeric GFP were able to localize the fusion protein to the nucleus in yeast indicating that it is indeed recognized by the yeast nuclear import machinery. We further showed that the PfTGS1 NLS binds to P. falciparum importin-α in vitro, confirming that the NLS is also recognized by the P. falciparum classical nuclear import machinery. Thus, in this study we report a novel function of the insertion in PfTGS1. Copyright © 2016 Elsevier B.V. All rights reserved.
Understand protein functions by comparing the similarity of local structural environments.
Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao
2017-02-01
The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions
Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret
2009-01-01
Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramalho, T.O.; Figueira, A.R.; Sotero, A.J.
2014-09-15
The emergence of viruses in Coffee (Coffea arabica and Coffea canephora), the most widely traded agricultural commodity in the world, is of critical concern. The RNA1 (6552 nt) of Coffee ringspot virus is organized into five open reading frames (ORFs) capable of encoding the viral nucleocapsid (ORF1p), phosphoprotein (ORF2p), putative cell-to-cell movement protein (ORF3p), matrix protein (ORF4p) and glycoprotein (ORF5p). Each ORF is separated by a conserved intergenic junction. RNA2 (5945 nt), which completes the bipartite genome, encodes a single protein (ORF6p) with homology to RNA-dependent RNA polymerases. Phylogenetic analysis of L protein sequences firmly establishes CoRSV as a membermore » of the recently proposed Dichorhavirus genus. Predictive algorithms, in planta protein expression, and a yeast-based nuclear import assay were used to determine the nucleophillic character of five CoRSV proteins. Finally, the temperature-dependent ability of CoRSV to establish systemic infections in an initially local lesion host was quantified. - Highlights: • We report genome sequence determination for Coffee ringspot virus (CoRSV). • CoRSV should be considered a member of the proposed Dichorhavirus genus. • We report temperature-dependent systemic infection of an initially local lesion host. • We report in planta protein and localization data for five CoRSV proteins. • In silico predictions of the CoRSV proteins were validated using in vivo assays.« less
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Computational Analysis of Uncharacterized Proteins of Environmental Bacterial Genome
NASA Astrophysics Data System (ADS)
Coxe, K. J.; Kumar, M.
2017-12-01
Betaproteobacteria strain CB is a gram-negative bacterium in the phylum Proteobacteria and are found naturally in soil and water. In this complex environment, bacteria play a key role in efficiently eliminating the organic material and other pollutants from wastewater. To investigate the process of pollutant removal from wastewater using bacteria, it is important to characterize the proteins encoded by the bacterial genome. Our study combines a number of bioinformatics tools to predict the function of unassigned proteins in the bacterial genome. The genome of Betaproteobacteria strain CB contains 2,112 proteins in which function of 508 proteins are unknown, termed as uncharacterized proteins (UPs). The localization of the UPs with in the cell was determined and the structure of 38 UPs was accurately predicted. These UPs were predicted to belong to various classes of proteins such as enzymes, transporters, binding proteins, signal peptides, transmembrane proteins and other proteins. The outcome of this work will help better understand wastewater treatment mechanism.
Role of Electrostatics in Protein-RNA Binding: The Global vs the Local Energy Landscape.
Ghaemi, Zhaleh; Guzman, Irisbel; Gnutt, David; Luthey-Schulten, Zaida; Gruebele, Martin
2017-09-14
U1A protein-stem loop 2 RNA association is a basic step in the assembly of the spliceosomal U1 small nuclear ribonucleoprotein. Long-range electrostatic interactions due to the positive charge of U1A are thought to provide high binding affinity for the negatively charged RNA. Short range interactions, such as hydrogen bonds and contacts between RNA bases and protein side chains, favor a specific binding site. Here, we propose that electrostatic interactions are as important as local contacts in biasing the protein-RNA energy landscape toward a specific binding site. We show by using molecular dynamics simulations that deletion of two long-range electrostatic interactions (K22Q and K50Q) leads to mutant-specific alternative RNA bound states. One of these states preserves short-range interactions with aromatic residues in the original binding site, while the other one does not. We test the computational prediction with experimental temperature-jump kinetics using a tryptophan probe in the U1A-RNA binding site. The two mutants show the distinct predicted kinetic behaviors. Thus, the stem loop 2 RNA has multiple binding sites on a rough RNA-protein binding landscape. We speculate that the rough protein-RNA binding landscape, when biased to different local minima by electrostatics, could be one way that protein-RNA interactions evolve toward new binding sites and novel function.
Contreras-Torres, Ernesto
2018-06-02
In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named Total Sum of Squares (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the amino acid-types and amino acid-groups formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the Nearest Neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Zhang, Long; Jia, Lianyin; Ren, Yazhou
2017-01-01
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study. PMID:29117139
Han, Mee-Jung; Yun, Hongseok; Lee, Jeong Wook; Lee, Yu Hyun; Lee, Sang Yup; Yoo, Jong-Shin; Kim, Jin Young; Kim, Jihyun F; Hur, Cheol-Goo
2011-04-01
Escherichia coli K-12 and B strains have most widely been employed for scientific studies as well as industrial applications. Recently, the complete genome sequences of two representative descendants of E. coli B strains, REL606 and BL21(DE3), have been determined. Here, we report the subproteome reference maps of E. coli B REL606 by analyzing cytoplasmic, periplasmic, inner and outer membrane, and extracellular proteomes based on the genome information using experimental and computational approaches. Among the total of 3487 spots, 651 proteins including 410 non-redundant proteins were identified and characterized by 2-DE and LC-MS/MS; they include 440 cytoplasmic, 45 periplasmic, 50 inner membrane, 61 outer membrane, and 55 extracellular proteins. In addition, subcellular localizations of all 4205 ORFs of E. coli B were predicted by combined computational prediction methods. The subcellular localizations of 1812 (43.09%) proteins of currently unknown function were newly assigned. The results of computational prediction were also compared with the experimental results, showing that overall precision and recall were 92.16 and 92.16%, respectively. This work represents the most comprehensive analyses of the subproteomes of E. coli B, and will be useful as a reference for proteome profiling studies under various conditions. The complete proteome data are available online (http://ecolib.kaist.ac.kr). Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Wang, Jun; Zhang, Long; Jia, Lianyin; Ren, Yazhou; Yu, Guoxian
2017-11-08
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae , DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.
Identification and subcellular localization of porcine deltacoronavirus accessory protein NS6
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Puxian; Fang, Liurong; Liu, Xiaorong
Porcine deltacoronavirus (PDCoV) is an emerging swine enteric coronavirus. Accessory proteins are genus-specific for coronavirus, and two putative accessory proteins, NS6 and NS7, are predicted to be encoded by PDCoV; however, this remains to be confirmed experimentally. Here, we identified the leader-body junction sites of NS6 subgenomic RNA (sgRNA) and found that the actual transcription regulatory sequence (TRS) utilized by NS6 is non-canonical and is located upstream of the predicted TRS. Using the purified NS6 from an Escherichia coli expression system, we obtained two anti-NS6 monoclonal antibodies that could detect the predicted NS6 in cells infected with PDCoV or transfectedmore » with NS6-expressing plasmids. Further studies revealed that NS6 is always localized in the cytoplasm of PDCoV-infected cells, mainly co-localizing with the endoplasmic reticulum (ER) and ER-Golgi intermediate compartments, as well as partially with the Golgi apparatus. Together, our results identify the NS6 sgRNA and demonstrate its expression in PDCoV-infected cells. -- Highlights: •The leader-body fusion site of NS6 sgRNA is identified. •NS6 sgRNA uses a non-canonical transcription regulatory sequence (TRS). •NS6 can be expressed in PDCoV-infected cell. •NS6 predominantly localize to the ER complex and ER-Golgi intermediate compartment.« less
Nucleolar Trafficking of Nucleostemin Family Proteins: Common versus Protein-Specific Mechanisms▿ §
Meng, Lingjun; Zhu, Qubo; Tsai, Robert Y. L.
2007-01-01
The nucleolus has begun to emerge as a subnuclear organelle capable of modulating the activities of nuclear proteins in a dynamic and cell type-dependent manner. It remains unclear whether one can extrapolate a rule that predicts the nucleolar localization of multiple proteins based on protein sequence. Here, we address this issue by determining the shared and unique mechanisms that regulate the static and dynamic distributions of a family of nucleolar GTP-binding proteins, consisting of nucleostemin (NS), guanine nucleotide binding protein-like 3 (GNL3L), and Ngp1. The nucleolar residence of GNL3L is short and primarily controlled by its basic-coiled-coil domain, whereas the nucleolar residence of NS and Ngp1 is long and requires the basic and the GTP-binding domains, the latter of which functions as a retention signal. All three proteins contain a nucleoplasmic localization signal (NpLS) that prevents their nucleolar accumulation. Unlike that of the basic domain, the activity of NpLS is dynamically controlled by the GTP-binding domain. The nucleolar retention and the NpLS-regulating functions of the G domain involve specific residues that cannot be predicted by overall protein homology. This work reveals common and protein-specific mechanisms underlying the nucleolar movement of NS family proteins. PMID:17923687
Villanueva, Josep; Villegas, Virtudes; Querol, Enrique; Avilés, Francesc X; Serrano, Luis
2002-09-01
In the post-genomic era, several projects focused on the massive experimental resolution of the three-dimensional structures of all the proteins of different organisms have been initiated. Simultaneously, significant progress has been made in the ab initio prediction of protein three-dimensional structure. One of the keys to the success of such a prediction is the use of local information (i.e. secondary structure). Here we describe a new limited proteolysis methodology, based on the use of unspecific exoproteases coupled with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), to map quickly secondary structure elements of a protein from both ends, the N- and C-termini. We show that the proteolytic patterns (mass spectra series) obtained can be interpreted in the light of the conformation and local stability of the analyzed proteins, a direct correlation being observed between the predicted and the experimentally derived protein secondary structure. Further, this methodology can be easily applied to check rapidly the folding state of a protein and characterize mutational effects on protein conformation and stability. Moreover, given global stability information, this methodology allows one to locate the protein regions of increased or decreased conformational stability. All of this can be done with a small fraction of the amount of protein required by most of the other methods for conformational analysis. Thus limited exoproteolysis, together with MALDI-TOF MS, can be a useful tool to achieve quickly the elucidation of protein structure and stability. Copyright 2002 John Wiley & Sons, Ltd.
Distinct Pathways Mediate the Sorting of Tail-anchored Mitochondrial Outer Membrane Proteins
USDA-ARS?s Scientific Manuscript database
Little is known about the biogenesis of tail-anchored (TA) proteins localized to the mitochondrial outer membrane in plant cells. To address this issue, we screened all of the (>600) known and predicted TA proteins in Arabidopsis thaliana for those annotated, based on Gene Ontology, to possess mitoc...
Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.
Han, Youngmahn; Kim, Dongsup
2017-12-28
Computational scanning of peptide candidates that bind to a specific major histocompatibility complex (MHC) can speed up the peptide-based vaccine development process and therefore various methods are being actively developed. Recently, machine-learning-based methods have generated successful results by training large amounts of experimental data. However, many machine learning-based methods are generally less sensitive in recognizing locally-clustered interactions, which can synergistically stabilize peptide binding. Deep convolutional neural network (DCNN) is a deep learning method inspired by visual recognition process of animal brain and it is known to be able to capture meaningful local patterns from 2D images. Once the peptide-MHC interactions can be encoded into image-like array(ILA) data, DCNN can be employed to build a predictive model for peptide-MHC binding prediction. In this study, we demonstrated that DCNN is able to not only reliably predict peptide-MHC binding, but also sensitively detect locally-clustered interactions. Nonapeptide-HLA-A and -B binding data were encoded into ILA data. A DCNN, as a pan-specific prediction model, was trained on the ILA data. The DCNN showed higher performance than other prediction tools for the latest benchmark datasets, which consist of 43 datasets for 15 HLA-A alleles and 25 datasets for 10 HLA-B alleles. In particular, the DCNN outperformed other tools for alleles belonging to the HLA-A3 supertype. The F1 scores of the DCNN were 0.86, 0.94, and 0.67 for HLA-A*31:01, HLA-A*03:01, and HLA-A*68:01 alleles, respectively, which were significantly higher than those of other tools. We found that the DCNN was able to recognize locally-clustered interactions that could synergistically stabilize peptide binding. We developed ConvMHC, a web server to provide user-friendly web interfaces for peptide-MHC class I binding predictions using the DCNN. ConvMHC web server can be accessible via http://jumong.kaist.ac.kr:8080/convmhc . We developed a novel method for peptide-HLA-I binding predictions using DCNN trained on ILA data that encode peptide binding data and demonstrated the reliable performance of the DCNN in nonapeptide binding predictions through the independent evaluation on the latest IEDB benchmark datasets. Our approaches can be applied to characterize locally-clustered patterns in molecular interactions, such as protein/DNA, protein/RNA, and drug/protein interactions.
Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen
2017-08-22
One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. To realize this, it is indispensable to develop an automated method for fast and accurate identification of the subcellular locations of uncharacterized proteins. The current study is focused on plant protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most of the existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex protein is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mPlant" by extracting the optimal GO (Gene Ontology) information into the Chou's general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on the same stringent benchmark dataset indicated that the proposed pLoc-mPlant predictor is remarkably superior to iLoc-Plant, the state-of-the-art method for predicting plant protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at , by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Molecular Dynamics Simulations of Hydrophobic Residues
NASA Astrophysics Data System (ADS)
Caballero, Diego; Zhou, Alice; Regan, Lynne; O'Hern, Corey
2013-03-01
Molecular recognition and protein-protein interactions are involved in important biological processes. However, despite recent improvements in computational methods for protein design, we still lack a predictive understanding of protein structure and interactions. To begin to address these shortcomings, we performed molecular dynamics simulations of hydrophobic residues modeled as hard spheres with stereo-chemical constraints initially at high temperature, and then quenched to low temperature to obtain local energy minima. We find that there is a range of quench rates over which the probabilities of side-chain dihedral angles for hydrophobic residues match the probabilities obtained for known protein structures. In addition, we predict the side-chain dihedral angle propensities in the core region of the proteins T4, ROP, and several mutants. These studies serve as a first step in developing the ability to quantitatively rank the energies of designed protein constructs. The success of these studies suggests that only hard-sphere dynamics with geometrical constraints are needed for accurate protein structure prediction in hydrophobic cavities and binding interfaces. NSF Grant PHY-1019147
Local functional descriptors for surface comparison based binding prediction
2012-01-01
Background Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to predict interactions without regard to the underlying structural scaffold that creates the surface. Results We present a new descriptor for protein functional surfaces and algorithms for using these descriptors to compare protein surface regions to identify ligand binding interfaces. Our approach uses descriptors of local regions of the surface, and assembles collections of matches to compare larger regions. Our approach uses a variety of physical, chemical, and geometric properties, adaptively weighting these properties as appropriate for different regions of the interface. Our approach builds a classifier based on a training corpus of examples of binding sites of the target ligand. The constructed classifiers can be applied to a query protein providing a probability for each position on the protein that the position is part of a binding interface. We demonstrate the effectiveness of the approach on a number of benchmarks, demonstrating performance that is comparable to the state-of-the-art, with an approach with more generality than these prior methods. Conclusions Local functional descriptors offer a new method for protein surface comparison that is sufficiently flexible to serve in a variety of applications. PMID:23176080
Characterization and Prediction of Protein Phosphorylation Hotspots in Arabidopsis thaliana.
Christian, Jan-Ole; Braginets, Rostyslav; Schulze, Waltraud X; Walther, Dirk
2012-01-01
The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Linial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at http://phosphat.mpimp-golm.mpg.de.
Parametric models to compute tryptophan fluorescence wavelengths from classical protein simulations.
Lopez, Alvaro J; Martínez, Leandro
2018-02-26
Fluorescence spectroscopy is an important method to study protein conformational dynamics and solvation structures. Tryptophan (Trp) residues are the most important and practical intrinsic probes for protein fluorescence due to the variability of their fluorescence wavelengths: Trp residues emit in wavelengths ranging from 308 to 360 nm depending on the local molecular environment. Fluorescence involves electronic transitions, thus its computational modeling is a challenging task. We show that it is possible to predict the wavelength of emission of a Trp residue from classical molecular dynamics simulations by computing the solvent-accessible surface area or the electrostatic interaction between the indole group and the rest of the system. Linear parametric models are obtained to predict the maximum emission wavelengths with standard errors of the order 5 nm. In a set of 19 proteins with emission wavelengths ranging from 308 to 352 nm, the best model predicts the maximum wavelength of emission with a standard error of 4.89 nm and a quadratic Pearson correlation coefficient of 0.81. These models can be used for the interpretation of fluorescence spectra of proteins with multiple Trp residues, or for which local Trp environmental variability exists and can be probed by classical molecular dynamics simulations. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Improved protein model quality assessments by changing the target function.
Uziela, Karolis; Menéndez Hurtado, David; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne
2018-06-01
Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates. © 2018 Wiley Periodicals, Inc.
Struct2Net: a web service to predict protein–protein interactions using a structure-based approach
Singh, Rohit; Park, Daniel; Xu, Jinbo; Hosur, Raghavendra; Berger, Bonnie
2010-01-01
Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach. Prediction of protein–protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome remains inadequate. We believe that Struct2Net is the first community-wide resource to provide structure-based PPI predictions that go beyond homology modeling. Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). Our structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried. The web service allows multiple querying options, aimed at maximizing flexibility. For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously. For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed. The web service is freely available at http://struct2net.csail.mit.edu. PMID:20513650
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Wang, Xue; Zhao, Kun; Kirberger, Michael; Wong, Hing; Chen, Guantao; Yang, Jenny J
2010-01-01
Calcium binding in proteins exhibits a wide range of polygonal geometries that relate directly to an equally diverse set of biological functions. The binding process stabilizes protein structures and typically results in local conformational change and/or global restructuring of the backbone. Previously, we established the MUG program, which utilized multiple geometries in the Ca2+-binding pockets of holoproteins to identify such pockets, ignoring possible Ca2+-induced conformational change. In this article, we first report our progress in the analysis of Ca2+-induced conformational changes followed by improved prediction of Ca2+-binding sites in the large group of Ca2+-binding proteins that exhibit only localized conformational changes. The MUGSR algorithm was devised to incorporate side chain torsional rotation as a predictor. The output from MUGSR presents groups of residues where each group, typically containing two to five residues, is a potential binding pocket. MUGSR was applied to both X-ray apo structures and NMR holo structures, which did not use calcium distance constraints in structure calculations. Predicted pockets were validated by comparison with homologous holo structures. Defining a “correct hit” as a group of residues containing at least two true ligand residues, the sensitivity was at least 90%; whereas for a “correct hit” defined as a group of residues containing at least three true ligand residues, the sensitivity was at least 78%. These data suggest that Ca2+-binding pockets are at least partially prepositioned to chelate the ion in the apo form of the protein. PMID:20512971
KFC Server: interactive forecasting of protein interaction hot spots
Darnell, Steven J.; LeGault, Laura; Mitchell, Julie C.
2008-01-01
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model—a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein–protein or protein–DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org. PMID:18539611
Intracellular Localization of Arabidopsis Sulfurtransferases1
Bauer, Michael; Dietrich, Christof; Nowak, Katharina; Sierralta, Walter D.; Papenbrock, Jutta
2004-01-01
Sulfurtransferases (Str) comprise a group of enzymes widely distributed in archaea, eubacteria, and eukaryota which catalyze the transfer of a sulfur atom from suitable sulfur donors to nucleophilic sulfur acceptors. In all organisms analyzed to date, small gene families encoding Str proteins have been identified. The gene products were localized to different compartments of the cells. Our interest concerns the localization of Str proteins encoded in the nuclear genome of Arabidopsis. Computer-based prediction methods revealed localization in different compartments of the cell for six putative AtStrs. Several methods were used to determine the localization of the AtStr proteins experimentally. For AtStr1, a mitochondrial localization was demonstrated by immunodetection in the proteome of isolated mitochondria resolved by one- and two-dimensional gel electrophoresis and subsequent blotting. The respective mature AtStr1 protein was identified by mass spectrometry sequencing. The same result was obtained by transient expression of fusion constructs with the green fluorescent protein in Arabidopsis protoplasts, whereas AtStr2 was exclusively localized to the cytoplasm by this method. Three members of the single-domain AtStr were localized in the chloroplasts as demonstrated by transient expression of green fluorescent protein fusions in protoplasts and stomata, whereas the single-domain AtStr18 was shown to be cytoplasmic. The remarkable subcellular distribution of AtStr15 was additionally analyzed by transmission electron immunomicroscopy using a monospecific antibody against green fluorescent protein, indicating an attachment to the thylakoid membrane. The knowledge of the intracellular localization of the members of this multiprotein family will help elucidate their specific functions in the organism. PMID:15181206
Intracellular localization of Arabidopsis sulfurtransferases.
Bauer, Michael; Dietrich, Christof; Nowak, Katharina; Sierralta, Walter D; Papenbrock, Jutta
2004-06-01
Sulfurtransferases (Str) comprise a group of enzymes widely distributed in archaea, eubacteria, and eukaryota which catalyze the transfer of a sulfur atom from suitable sulfur donors to nucleophilic sulfur acceptors. In all organisms analyzed to date, small gene families encoding Str proteins have been identified. The gene products were localized to different compartments of the cells. Our interest concerns the localization of Str proteins encoded in the nuclear genome of Arabidopsis. Computer-based prediction methods revealed localization in different compartments of the cell for six putative AtStrs. Several methods were used to determine the localization of the AtStr proteins experimentally. For AtStr1, a mitochondrial localization was demonstrated by immunodetection in the proteome of isolated mitochondria resolved by one- and two-dimensional gel electrophoresis and subsequent blotting. The respective mature AtStr1 protein was identified by mass spectrometry sequencing. The same result was obtained by transient expression of fusion constructs with the green fluorescent protein in Arabidopsis protoplasts, whereas AtStr2 was exclusively localized to the cytoplasm by this method. Three members of the single-domain AtStr were localized in the chloroplasts as demonstrated by transient expression of green fluorescent protein fusions in protoplasts and stomata, whereas the single-domain AtStr18 was shown to be cytoplasmic. The remarkable subcellular distribution of AtStr15 was additionally analyzed by transmission electron immunomicroscopy using a monospecific antibody against green fluorescent protein, indicating an attachment to the thylakoid membrane. The knowledge of the intracellular localization of the members of this multiprotein family will help elucidate their specific functions in the organism.
Proteins Annexin A2 and PSA in Prostate Cancer Biopsies Do Not Predict Biochemical Failure.
Lamb, David S; Sondhauss, Sven; Dunne, Jonathan C; Woods, Lisa; Delahunt, Brett; Ferguson, Peter; Murray, Judith; Nacey, John N; Denham, James W; Jordan, T William
2017-12-01
We previously reported the use of mass spectrometry and western blotting to identify proteins from tumour regions of formalin-fixed paraffin-embedded biopsies from 16 men who presented with apparently localized prostate cancer, and found that annexin A2 (ANXA2) appeared to be a better predictor of subsequent biochemical failure than prostate-specific antigen (PSA). In this follow-up study, ANXA2 and PSA were measured using western blotting of proteins extracted from biopsies from 37 men from a subsequent prostate cancer trial. No significant differences in ANXA2 and PSA levels were observed between men with and without biochemical failure. The statistical effect sizes were small, d=0.116 for ANXA2, and 0.266 for PSA. ANXA2 and PSA proteins measured from biopsy tumour regions are unlikely to be good biomarkers for prediction of the clinical outcome of prostate cancer presenting with apparently localized disease. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
Lee, Sunghoon; Lee, Byungwook; Jang, Insoo; Kim, Sangsoo; Bhak, Jong
2006-01-01
The Localizome server predicts the transmembrane (TM) helix number and TM topology of a user-supplied eukaryotic protein and presents the result as an intuitive graphic representation. It utilizes hmmpfam to detect the presence of Pfam domains and a prediction algorithm, Phobius, to predict the TM helices. The results are combined and checked against the TM topology rules stored in a protein domain database called LocaloDom. LocaloDom is a curated database that contains TM topologies and TM helix numbers of known protein domains. It was constructed from Pfam domains combined with Swiss-Prot annotations and Phobius predictions. The Localizome server corrects the combined results of the user sequence to conform to the rules stored in LocaloDom. Compared with other programs, this server showed the highest accuracy for TM topology prediction: for soluble proteins, the accuracy and coverage were 99 and 75%, respectively, while for TM protein domain regions, they were 96 and 68%, respectively. With a graphical representation of TM topology and TM helix positions with the domain units, the Localizome server is a highly accurate and comprehensive information source for subcellular localization for soluble proteins as well as membrane proteins. The Localizome server can be found at . PMID:16845118
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming
2016-01-01
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua
2017-02-01
Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.
Candat, Adrien; Paszkiewicz, Gaël; Neveu, Martine; Gautier, Romain; Logan, David C.; Avelange-Macherel, Marie-Hélène; Macherel, David
2014-01-01
Late embryogenesis abundant (LEA) proteins are hydrophilic, mostly intrinsically disordered proteins, which play major roles in desiccation tolerance. In Arabidopsis thaliana, 51 genes encoding LEA proteins clustered into nine families have been inventoried. To increase our understanding of the yet enigmatic functions of these gene families, we report the subcellular location of each protein. Experimental data highlight the limits of in silico predictions for analysis of subcellular localization. Thirty-six LEA proteins localized to the cytosol, with most being able to diffuse into the nucleus. Three proteins were exclusively localized in plastids or mitochondria, while two others were found dually targeted to these organelles. Targeting cleavage sites could be determined for five of these proteins. Three proteins were found to be endoplasmic reticulum (ER) residents, two were vacuolar, and two were secreted. A single protein was identified in pexophagosomes. While most LEA protein families have a unique subcellular localization, members of the LEA_4 family are widely distributed (cytosol, mitochondria, plastid, ER, and pexophagosome) but share the presence of the class A α-helix motif. They are thus expected to establish interactions with various cellular membranes under stress conditions. The broad subcellular distribution of LEA proteins highlights the requirement for each cellular compartment to be provided with protective mechanisms to cope with desiccation or cold stress. PMID:25005920
Bedbrook, Claire N; Yang, Kevin K; Rice, Austin J; Gradinaru, Viviana; Arnold, Frances H
2017-10-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.
Rice, Austin J.; Gradinaru, Viviana; Arnold, Frances H.
2017-01-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well. PMID:29059183
Antonenkov, Vasily D; Ohlmeier, Steffen; Sormunen, Raija T; Hiltunen, J Kalervo
2007-05-25
Mammalian UK114 belongs to a highly conserved family of proteins with unknown functions. Although it is believed that UK114 is a cytosolic or mitochondrial protein there is no detailed study of its intracellular localization. Using analytical subcellular fractionation, electron microscopic colloidal gold technique, and two-dimensional gel electrophoresis of peroxisomal matrix proteins combined with mass spectrometric analysis we show here that a large portion of UK114 is present in rat liver peroxisomes. The peroxisomal UK114 is a soluble matrix protein and it is not inducible by the peroxisomal proliferator clofibrate. The data predict involvement of UK114 in peroxisomal metabolism.
Seki, N; Muramatsu, M; Sugano, S; Suzuki, Y; Nakagawara, A; Ohhira, M; Hayashi, A; Hori, T; Saito, T
1998-01-01
Huntington disease (HD) is an inherited neurodegenerative disorder which is associated with CAG expansion in the coding region of the gene for huntingtin protein. Recently, a huntingtin interacting protein, HIP1, was isolated by the yeast two-hybrid system. Here we report the isolation of a cDNA clone for HIP1R (huntingtin interacting protein-1 related), which encodes a predicted protein product sharing a striking homology with HIP1. RT-PCR analysis showed that the messenger RNA was ubiquitously expressed in various human tissues. Based on PCR-assisted analysis of a radiation hybrid panel and fluorescence in situ hybridization, HIP1R was localized to the q24 region of chromosome 12.
Nucleolar localization of cirhin, the protein mutated in North American Indian childhood cirrhosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Bin; Mitchell, Grant A.; Richter, Andrea
2005-12-10
Cirhin (NP{sub 1}16219), the product of the CIRH1A gene is mutated in North American Indian childhood cirrhosis (NAIC/CIRH1A, OMIM 604901), a severe autosomal recessive intrahepatic cholestasis. It is a 686-amino-acid WD40-repeat containing protein of unknown function that is predicted to contain multiple targeting signals, including an N-terminal mitochondrial targeting signal, a C-terminal monopartite nuclear localization signal (NLS) and a bipartite nuclear localization signal (BNLS). We performed the direct determination of subcellular localization of cirhin as a crucial first step in unraveling its biological function. Using EGFP and His-tagged cirhin fusion proteins expressed in HeLa and HepG2, cells we show thatmore » cirhin is a nucleolar protein and that the R565W mutation, for which all NAIC patients are homozygous, has no effect on subcellular localization. Cirhin has an active C-terminal monopartite nuclear localization signal (NLS) and a unique nucleolar localization signal (NrLS) between residues 315 and 432. The nucleolus is not known to be important specifically for intrahepatic cholestasis. These observations provide a new dimension in the study of hereditary cholestasis.« less
Biological and functional relevance of CASP predictions.
Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B
2018-03-01
Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.
Protein asparagine deamidation prediction based on structures with machine learning methods.
Jia, Lei; Sun, Yaxiong
2017-01-01
Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Structure and non-structure of centrosomal proteins.
Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis
2013-01-01
Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.
Kulp, John L.; Cloudsdale, Ian S.; Kulp, John L.
2017-01-01
Chemically diverse fragments tend to collectively bind at localized sites on proteins, which is a cornerstone of fragment-based techniques. A central question is how general are these strategies for predicting a wide variety of molecular interactions such as small molecule-protein, protein-protein and protein-nucleic acid for both experimental and computational methods. To address this issue, we recently proposed three governing principles, (1) accurate prediction of fragment-macromolecule binding free energy, (2) accurate prediction of water-macromolecule binding free energy, and (3) locating sites on a macromolecule that have high affinity for a diversity of fragments and low affinity for water. To test the generality of these concepts we used the computational technique of Simulated Annealing of Chemical Potential to design one small fragment to break the RecA-RecA protein-protein interaction and three fragments that inhibit peptide-deformylase via water-mediated multi-body interactions. Experiments confirm the predictions that 6-hydroxydopamine potently inhibits RecA and that PDF inhibition quantitatively tracks the water-mediated binding predictions. Additionally, the principles correctly predict the essential bound waters in HIV Protease, the surprisingly extensive binding site of elastase, the pinpoint location of electron transfer in dihydrofolate reductase, the HIV TAT-TAR protein-RNA interactions, and the MDM2-MDM4 differential binding to p53. The experimental confirmations of highly non-obvious predictions combined with the precise characterization of a broad range of known phenomena lend strong support to the generality of fragment-based methods for characterizing molecular recognition. PMID:28837642
Kulp, John L; Cloudsdale, Ian S; Kulp, John L; Guarnieri, Frank
2017-01-01
Chemically diverse fragments tend to collectively bind at localized sites on proteins, which is a cornerstone of fragment-based techniques. A central question is how general are these strategies for predicting a wide variety of molecular interactions such as small molecule-protein, protein-protein and protein-nucleic acid for both experimental and computational methods. To address this issue, we recently proposed three governing principles, (1) accurate prediction of fragment-macromolecule binding free energy, (2) accurate prediction of water-macromolecule binding free energy, and (3) locating sites on a macromolecule that have high affinity for a diversity of fragments and low affinity for water. To test the generality of these concepts we used the computational technique of Simulated Annealing of Chemical Potential to design one small fragment to break the RecA-RecA protein-protein interaction and three fragments that inhibit peptide-deformylase via water-mediated multi-body interactions. Experiments confirm the predictions that 6-hydroxydopamine potently inhibits RecA and that PDF inhibition quantitatively tracks the water-mediated binding predictions. Additionally, the principles correctly predict the essential bound waters in HIV Protease, the surprisingly extensive binding site of elastase, the pinpoint location of electron transfer in dihydrofolate reductase, the HIV TAT-TAR protein-RNA interactions, and the MDM2-MDM4 differential binding to p53. The experimental confirmations of highly non-obvious predictions combined with the precise characterization of a broad range of known phenomena lend strong support to the generality of fragment-based methods for characterizing molecular recognition.
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Protein structure prediction with local adjust tabu search algorithm
2014-01-01
Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
Wang, Jun; Gui, Lang; Chen, Zong-Yan; Zhang, Qi-Ya
2016-08-01
G protein-coupled receptors (GPCRs) are known as seven transmembrane domain receptors and consequently can mediate diverse biological functions via regulation of their subcellular localization. Crucian carp herpesvirus (CaHV) was recently isolated from infected fish with acute gill hemorrhage. CaHV GPCR of 349 amino acids (aa) was identified based on amino acid identity. A series of variants with truncation/deletion/substitution mutation in the C-terminal (aa 315-349) were constructed and expressed in fathead minnow (FHM) cells. The roles of three key C-terminal regions in subcellular localization of CaHV GPCR were determined. Lysine-315 (K-315) directed the aggregation of the protein preferentially at the nuclear side. Predicted N-myristoylation site (GGGWTR, aa 335-340) was responsible for punctate distribution in periplasm or throughout the cytoplasm. Predicted phosphorylation site (SSR, aa 327-329) and GGGWTR together determined the punctate distribution in cytoplasm. Detection of organelles localization by specific markers showed that the protein retaining K-315 colocalized with the Golgi apparatus. These experiments provided first evidence that different mutations of CaHV GPCR C-terminals have different affects on the subcellular localization of fish herpesvirus-encoded GPCRs. The study provided valuable information and new insights into the precise interactions between herpesvirus and fish cells, and could also provide useful targets for antiviral agents in aquaculture.
Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam
2018-04-30
A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.
The dual role of fragments in fragment-assembly methods for de novo protein structure prediction
Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.
2013-01-01
In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594
NASA Astrophysics Data System (ADS)
Vitarelli, Michael J.; Talaga, David S.
2013-09-01
Single solid-state nanopores find increasing use for electrical detection and/or manipulation of macromolecules. These applications exploit the changes in signals due to the geometry and electrical properties of the molecular species found within the nanopore. The sensitivity and resolution of such measurements are also influenced by the geometric and electrical properties of the nanopore. This paper continues the development of an analytical theory to predict the electrochemical impedance spectra of nanopores by including the influence of the presence of an unfolded protein using the variable topology finite Warburg impedance model previously published by the authors. The local excluded volume of, and charges present on, the segment of protein sampled by the nanopore are shown to influence the shape and peak frequency of the electrochemical impedance spectrum. An analytical theory is used to relate the capacitive response of the electrical double layer at the surface of the protein to both the charge density at the protein surface and the more commonly measured zeta potential. Illustrative examples show how the theory predicts that the varying sequential regions of surface charge density and excluded volume dictated by the protein primary structure may allow for an impedance-based approach to identifying unfolded proteins.
Traverso, José A; Micalella, Chiara; Martinez, Aude; Brown, Spencer C; Satiat-Jeunemaître, Béatrice; Meinnel, Thierry; Giglione, Carmela
2013-03-01
N-terminal fatty acylations (N-myristoylation [MYR] and S-palmitoylation [PAL]) are crucial modifications affecting 2 to 4% of eukaryotic proteins. The role of these modifications is to target proteins to membranes. Predictive tools have revealed unexpected targets of these acylations in Arabidopsis thaliana and other plants. However, little is known about how N-terminal lipidation governs membrane compartmentalization of proteins in plants. We show here that h-type thioredoxins (h-TRXs) cluster in four evolutionary subgroups displaying strictly conserved N-terminal modifications. It was predicted that one subgroup undergoes only MYR and another undergoes both MYR and PAL. We used plant TRXs as a model protein family to explore the effect of MYR alone or MYR and PAL in the same family of proteins. We used a high-throughput biochemical strategy to assess MYR of specific TRXs. Moreover, various TRX-green fluorescent protein fusions revealed that MYR localized protein to the endomembrane system and that partitioning between this membrane compartment and the cytosol correlated with the catalytic efficiency of the N-myristoyltransferase acting at the N terminus of the TRXs. Generalization of these results was obtained using several randomly selected Arabidopsis proteins displaying a MYR site only. Finally, we demonstrated that a palmitoylatable Cys residue flanking the MYR site is crucial to localize proteins to micropatching zones of the plasma membrane.
Traverso, José A.; Micalella, Chiara; Martinez, Aude; Brown, Spencer C.; Satiat-Jeunemaître, Béatrice; Meinnel, Thierry; Giglione, Carmela
2013-01-01
N-terminal fatty acylations (N-myristoylation [MYR] and S-palmitoylation [PAL]) are crucial modifications affecting 2 to 4% of eukaryotic proteins. The role of these modifications is to target proteins to membranes. Predictive tools have revealed unexpected targets of these acylations in Arabidopsis thaliana and other plants. However, little is known about how N-terminal lipidation governs membrane compartmentalization of proteins in plants. We show here that h-type thioredoxins (h-TRXs) cluster in four evolutionary subgroups displaying strictly conserved N-terminal modifications. It was predicted that one subgroup undergoes only MYR and another undergoes both MYR and PAL. We used plant TRXs as a model protein family to explore the effect of MYR alone or MYR and PAL in the same family of proteins. We used a high-throughput biochemical strategy to assess MYR of specific TRXs. Moreover, various TRX–green fluorescent protein fusions revealed that MYR localized protein to the endomembrane system and that partitioning between this membrane compartment and the cytosol correlated with the catalytic efficiency of the N-myristoyltransferase acting at the N terminus of the TRXs. Generalization of these results was obtained using several randomly selected Arabidopsis proteins displaying a MYR site only. Finally, we demonstrated that a palmitoylatable Cys residue flanking the MYR site is crucial to localize proteins to micropatching zones of the plasma membrane. PMID:23543785
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.
Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de
2006-03-31
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Dursun, Erdinç; Gezen-Ak, Duygu
2017-01-01
Our recent study indicated that vitamin D and its receptors are important parts of the amyloid processing pathway in neurons. Yet the role of vitamin D receptor (VDR) in amyloid pathogenesis is complex and all regulations over the production of amyloid beta cannot be explained solely with the transcriptional regulatory properties of VDR. Given that we hypothesized that VDR might exist on the neuronal plasma membrane in close proximity with amyloid precursor protein (APP) and secretase complexes. The present study primarily focused on the localization of VDR in neurons and its interaction with amyloid pathology-related proteins. The localization of VDR on neuronal membranes and its co-localization with target proteins were investigated with cell surface staining followed by immunofluorescence labelling. The FpClass was used for protein-protein interaction prediction. Our results demonstrated the localization of VDR on the neuronal plasma membrane and the co-localization of VDR and APP or ADAM10 or Nicastrin and limited co-localization of VDR and PS1. E-cadherin interaction with APP or the γ-secretase complex may involve NOTCH1, NUMB, or FHL2, according to FpClass. This suggested complex might also include VDR, which greatly contributes to Ca+2 hemostasis with its ligand vitamin D. Consequently, we suggested that VDR might be a member of this complex also with its own non-genomic action and that it can regulate the APP processing pathway in this way in neurons.
HPEPDOCK: a web server for blind peptide-protein docking based on a hierarchical algorithm.
Zhou, Pei; Jin, Bowen; Li, Hao; Huang, Sheng-You
2018-05-09
Protein-peptide interactions are crucial in many cellular functions. Therefore, determining the structure of protein-peptide complexes is important for understanding the molecular mechanism of related biological processes and developing peptide drugs. HPEPDOCK is a novel web server for blind protein-peptide docking through a hierarchical algorithm. Instead of running lengthy simulations to refine peptide conformations, HPEPDOCK considers the peptide flexibility through an ensemble of peptide conformations generated by our MODPEP program. For blind global peptide docking, HPEPDOCK obtained a success rate of 33.3% in binding mode prediction on a benchmark of 57 unbound cases when the top 10 models were considered, compared to 21.1% for pepATTRACT server. HPEPDOCK also performed well in docking against homology models and obtained a success rate of 29.8% within top 10 predictions. For local peptide docking, HPEPDOCK achieved a high success rate of 72.6% on a benchmark of 62 unbound cases within top 10 predictions, compared to 45.2% for HADDOCK peptide protocol. Our HPEPDOCK server is computationally efficient and consumed an average of 29.8 mins for a global peptide docking job and 14.2 mins for a local peptide docking job. The HPEPDOCK web server is available at http://huanglab.phys.hust.edu.cn/hpepdock/.
Defining the consequences of genetic variation on a proteome–wide scale
Chick, Joel M.; Munger, Steven C.; Simecek, Petr; Huttlin, Edward L.; Choi, Kwangbom; Gatti, Daniel M.; Raghupathy, Narayanan; Svenson, Karen L.; Churchill, Gary A.; Gygi, Steven P.
2016-01-01
Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein–protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice. PMID:27309819
Determining the Localization of Carbohydrate Active Enzymes Within Gram-Negative Bacteria.
McLean, Richard; Inglis, G Douglas; Mosimann, Steven C; Uwiera, Richard R E; Abbott, D Wade
2017-01-01
Investigating the subcellular location of secreted proteins is valuable for illuminating their biological function. Although several bioinformatics programs currently exist to predict the destination of a trafficked protein using its signal peptide sequence, these programs have limited accuracy and often require experimental validation. Here, we present a systematic method to fractionate gram-negative cells and characterize the subcellular localization of secreted carbohydrate active enzymes (CAZymes). This method involves four parallel approaches that reveal the relative abundance of protein within the cytoplasm, periplasm, outer membrane, and extracellular environment. Cytoplasmic and periplasmic proteins are fractionated by lysis and osmotic shock, respectively. Outer membrane bound proteins are determined by comparing cells before and after exoproteolytic digestion. Extracellularly secreted proteins are collected from the media and concentrated. These four different fractionations can then be probed for the presence and quantity of target proteins using immunochemical methods such as Western blots and ELISAs, or enzyme activity assays.
Wan, Shixiang; Duan, Yucong; Zou, Quan
2017-09-01
Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
3dRPC: a web server for 3D RNA-protein structure prediction.
Huang, Yangyu; Li, Haotian; Xiao, Yi
2018-04-01
RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.
Cannistraci, Carlo Vittorio; Alanis-Lobato, Gregorio; Ravasi, Timothy
2013-01-01
Growth and remodelling impact the network topology of complex systems, yet a general theory explaining how new links arise between existing nodes has been lacking, and little is known about the topological properties that facilitate link-prediction. Here we investigate the extent to which the connectivity evolution of a network might be predicted by mere topological features. We show how a link/community-based strategy triggers substantial prediction improvements because it accounts for the singular topology of several real networks organised in multiple local communities - a tendency here named local-community-paradigm (LCP). We observe that LCP networks are mainly formed by weak interactions and characterise heterogeneous and dynamic systems that use self-organisation as a major adaptation strategy. These systems seem designed for global delivery of information and processing via multiple local modules. Conversely, non-LCP networks have steady architectures formed by strong interactions, and seem designed for systems in which information/energy storage is crucial. PMID:23563395
Cannistraci, Carlo Vittorio; Alanis-Lobato, Gregorio; Ravasi, Timothy
2013-01-01
Growth and remodelling impact the network topology of complex systems, yet a general theory explaining how new links arise between existing nodes has been lacking, and little is known about the topological properties that facilitate link-prediction. Here we investigate the extent to which the connectivity evolution of a network might be predicted by mere topological features. We show how a link/community-based strategy triggers substantial prediction improvements because it accounts for the singular topology of several real networks organised in multiple local communities - a tendency here named local-community-paradigm (LCP). We observe that LCP networks are mainly formed by weak interactions and characterise heterogeneous and dynamic systems that use self-organisation as a major adaptation strategy. These systems seem designed for global delivery of information and processing via multiple local modules. Conversely, non-LCP networks have steady architectures formed by strong interactions, and seem designed for systems in which information/energy storage is crucial.
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.
Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun
2015-01-01
Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
HARMONY: a server for the assessment of protein structures
Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.
2006-01-01
Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999
Zhan, Yiling; Guo, Shuyuan
2015-01-01
Bacillus thuringiensis (Bt) is capable of producing a chitin-binding protein believed to be functionally important to bacteria during the stationary phase of its growth cycle. In this paper, the chitin-binding domain 3 protein HD73_3189 from B. thuringiensis has been analyzed by computer technology. Primary and secondary structural analyses demonstrated that HD73_3189 is negatively charged and contains several α-helices, aperiodical coils and β-strands. Domain and motif analyses revealed that HD73_3189 contains a signal peptide, an N-terminal chitin binding 3 domains, two copies of a fibronectin-like domain 3 and a C-terminal carbohydrate binding domain classified as CBM_5_12. Moreover, analysis predicted the protein's associated localization site to be the cell wall. Ligand site prediction determined that amino acid residues GLU-312, TRP-334, ILE-341 and VAL-382 exposed on the surface of the target protein exhibit polar interactions with the substrate.
Prediction of allosteric sites and mediating interactions through bond-to-bond propensities
NASA Astrophysics Data System (ADS)
Amor, B. R. C.; Schaub, M. T.; Yaliraki, S. N.; Barahona, M.
2016-08-01
Allostery is a fundamental mechanism of biological regulation, in which binding of a molecule at a distant location affects the active site of a protein. Allosteric sites provide targets to fine-tune protein activity, yet we lack computational methodologies to predict them. Here we present an efficient graph-theoretical framework to reveal allosteric interactions (atoms and communication pathways strongly coupled to the active site) without a priori information of their location. Using an atomistic graph with energy-weighted covalent and weak bonds, we define a bond-to-bond propensity quantifying the non-local effect of instantaneous bond fluctuations propagating through the protein. Significant interactions are then identified using quantile regression. We exemplify our method with three biologically important proteins: caspase-1, CheY, and h-Ras, correctly predicting key allosteric interactions, whose significance is additionally confirmed against a reference set of 100 proteins. The almost-linear scaling of our method renders it suitable for high-throughput searches for candidate allosteric sites.
Prediction of allosteric sites and mediating interactions through bond-to-bond propensities
Amor, B. R. C.; Schaub, M. T.; Yaliraki, S. N.; Barahona, M.
2016-01-01
Allostery is a fundamental mechanism of biological regulation, in which binding of a molecule at a distant location affects the active site of a protein. Allosteric sites provide targets to fine-tune protein activity, yet we lack computational methodologies to predict them. Here we present an efficient graph-theoretical framework to reveal allosteric interactions (atoms and communication pathways strongly coupled to the active site) without a priori information of their location. Using an atomistic graph with energy-weighted covalent and weak bonds, we define a bond-to-bond propensity quantifying the non-local effect of instantaneous bond fluctuations propagating through the protein. Significant interactions are then identified using quantile regression. We exemplify our method with three biologically important proteins: caspase-1, CheY, and h-Ras, correctly predicting key allosteric interactions, whose significance is additionally confirmed against a reference set of 100 proteins. The almost-linear scaling of our method renders it suitable for high-throughput searches for candidate allosteric sites. PMID:27561351
Zhang, Yi; Nikolovski, Nino; Sorieul, Mathias; Vellosillo, Tamara; McFarlane, Heather E; Dupree, Ray; Kesten, Christopher; Schneider, René; Driemeier, Carlos; Lathe, Rahul; Lampugnani, Edwin; Yu, Xiaolan; Ivakov, Alexander; Doblin, Monika S; Mortimer, Jenny C; Brown, Steven P; Persson, Staffan; Dupree, Paul
2016-06-09
As the most abundant biopolymer on Earth, cellulose is a key structural component of the plant cell wall. Cellulose is produced at the plasma membrane by cellulose synthase (CesA) complexes (CSCs), which are assembled in the endomembrane system and trafficked to the plasma membrane. While several proteins that affect CesA activity have been identified, components that regulate CSC assembly and trafficking remain unknown. Here we show that STELLO1 and 2 are Golgi-localized proteins that can interact with CesAs and control cellulose quantity. In the absence of STELLO function, the spatial distribution within the Golgi, secretion and activity of the CSCs are impaired indicating a central role of the STELLO proteins in CSC assembly. Point mutations in the predicted catalytic domains of the STELLO proteins indicate that they are glycosyltransferases facing the Golgi lumen. Hence, we have uncovered proteins that regulate CSC assembly in the plant Golgi apparatus.
Proteome-wide Subcellular Topologies of E. coli Polypeptides Database (STEPdb)*
Orfanoudaki, Georgia; Economou, Anastassios
2014-01-01
Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb. PMID:25210196
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stekhoven, Daniel J.; Omasits, Ulrich; Quebatte, Maxime
2014-03-01
Proteomics data provide unique insights into biological systems, including the predominant subcellular localization (SCL) of proteins, which can reveal important clues about their functions. Here we analyzed data of a complete prokaryotic proteome expressed under two conditions mimicking interaction of the emerging pathogen Bartonella henselae with its mammalian host. Normalized spectral count data from cytoplasmic, total membrane, inner and outer membrane fractions allowed us to identify the predominant SCL for 82% of the identified proteins. The spectral count proportion of total membrane versus cytoplasmic fractions indicated the propensity of cytoplasmic proteins to co-fractionate with the inner membrane, and enabled usmore » to distinguish cytoplasmic, peripheral innermembrane and bona fide inner membrane proteins. Principal component analysis and k-nearest neighbor classification training on selected marker proteins or predominantly localized proteins, allowed us to determine an extensive catalog of at least 74 expressed outer membrane proteins, and to extend the SCL assignment to 94% of the identified proteins, including 18% where in silico methods gave no prediction. Suitable experimental proteomics data combined with straightforward computational approaches can thus identify the predominant SCL on a proteome-wide scale. Finally, we present a conceptual approach to identify proteins potentially changing their SCL in a condition-dependent fashion.« less
Bardhan, Jaydeep P
2011-09-14
We study the energetics of burying charges, ion pairs, and ionizable groups in a simple protein model using nonlocal continuum electrostatics. Our primary finding is that the nonlocal response leads to markedly reduced solvent screening, comparable to the use of application-specific protein dielectric constants. Employing the same parameters as used in other nonlocal studies, we find that for a sphere of radius 13.4 Å containing a single +1e charge, the nonlocal solvation free energy varies less than 18 kcal/mol as the charge moves from the surface to the center, whereas the difference in the local Poisson model is ∼35 kcal/mol. Because an ion pair (salt bridge) generates a comparatively more rapidly varying Coulomb potential, energetics for salt bridges are even more significantly reduced in the nonlocal model. By varying the central parameter in nonlocal theory, which is an effective length scale associated with correlations between solvent molecules, nonlocal-model energetics can be varied from the standard local results to essentially zero; however, the existence of the reduction in charge-burial penalties is quite robust to variations in the protein dielectric constant and the correlation length. Finally, as a simple exploratory test of the implications of nonlocal response, we calculate glutamate pK(a) shifts and find that using standard protein parameters (ε(protein) = 2-4), nonlocal results match local-model predictions with much higher dielectric constants. Nonlocality may, therefore, be one factor in resolving discrepancies between measured protein dielectric constants and the model parameters often used to match titration experiments. Nonlocal models may hold significant promise to deepen our understanding of macromolecular electrostatics without substantially increasing computational complexity. © 2011 American Institute of Physics
Kemege, Kyle E.; Hickey, John M.; Barta, Michael L.; ...
2014-11-10
Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies in this paper expand on those observations through protein structure, mutagenesis andmore » cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient Escherichia coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. Finally, MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia.« less
Kemege, Kyle E.; Hickey, John M.; Barta, Michael L.; Wickstrum, Jason; Balwalli, Namita; Lovell, Scott; Battaile, Kevin P.; Hefty, P. Scott
2015-01-01
Summary Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies herein expand on those observations through protein structure, mutagenesis, and cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient E. coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia. PMID:25382739
Kemege, Kyle E; Hickey, John M; Barta, Michael L; Wickstrum, Jason; Balwalli, Namita; Lovell, Scott; Battaile, Kevin P; Hefty, P Scott
2015-02-01
Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies herein expand on those observations through protein structure, mutagenesis and cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient Escherichia coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia. © 2014 John Wiley & Sons Ltd.
2011-01-01
The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system. PMID:21714899
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
Jin, Liang; Zhang, Kai; Sternglanz, Rolf; Neiman, Aaron M
2017-05-01
In response to starvation, diploid cells of Saccharomyces cerevisiae undergo meiosis and form haploid spores, a process collectively referred to as sporulation. The differentiation into spores requires extensive changes in gene expression. The transcriptional activator Ndt80 is a central regulator of this process, which controls many genes essential for sporulation. Ndt80 induces ∼300 genes coordinately during meiotic prophase, but different mRNAs within the NDT80 regulon are translated at different times during sporulation. The protein kinase Ime2 and RNA binding protein Rim4 are general regulators of meiotic translational delay, but how differential timing of individual transcripts is achieved was not known. This report describes the characterization of two related NDT80 -induced genes, PES4 and MIP6 , encoding predicted RNA binding proteins. These genes are necessary to regulate the steady-state expression, translational timing, and localization of a set of mRNAs that are transcribed by NDT80 but not translated until the end of meiosis II. Mutations in the predicted RNA binding domains within PES4 alter the stability of target mRNAs. PES4 and MIP6 affect only a small portion of the NDT80 regulon, indicating that they act as modulators of the general Ime2/Rim4 pathway for specific transcripts. Copyright © 2017 American Society for Microbiology.
Duffy, Ellen B.; Barquera, Blanca
2006-01-01
The membrane topologies of the six subunits of Na+-translocating NADH:quinone oxidoreductase (Na+-NQR) from Vibrio cholerae were determined by a combination of topology prediction algorithms and the construction of C-terminal fusions. Fusion expression vectors contained either bacterial alkaline phosphatase (phoA) or green fluorescent protein (gfp) genes as reporters of periplasmic and cytoplasmic localization, respectively. A majority of the topology prediction algorithms did not predict any transmembrane helices for NqrA. A lack of PhoA activity when fused to the C terminus of NqrA and the observed fluorescence of the green fluorescent protein C-terminal fusion confirm that this subunit is localized to the cytoplasmic side of the membrane. Analysis of four PhoA fusions for NqrB indicates that this subunit has nine transmembrane helices and that residue T236, the binding site for flavin mononucleotide (FMN), resides in the cytoplasm. Three fusions confirm that the topology of NqrC consists of two transmembrane helices with the FMN binding site at residue T225 on the cytoplasmic side. Fusion analysis of NqrD and NqrE showed almost mirror image topologies, each consisting of six transmembrane helices; the results for NqrD and NqrE are consistent with the topologies of Escherichia coli homologs YdgQ and YdgL, respectively. The NADH, flavin adenine dinucleotide, and Fe-S center binding sites of NqrF were localized to the cytoplasm. The determination of the topologies of the subunits of Na+-NQR provides valuable insights into the location of cofactors and identifies targets for mutagenesis to characterize this enzyme in more detail. The finding that all the redox cofactors are localized to the cytoplasmic side of the membrane is discussed. PMID:17041063
Defining Aggressive Prostate Cancer Using a 12-Gene Model1
Riva, Alberto; Kim, Robert; Varambally, Sooryanarayana; He, Le; Kutok, Jeff; Aster, Jonathan C; Tang, Jeffery; Kuefer, Rainer; Hofer, Matthias D; Febbo, Phillip G; Chinnaiyan, Arul M; Rubin, Mark A
2006-01-01
Abstract The critical clinical question in prostate cancer research is: How do we develop means of distinguishing aggressive disease from indolent disease? Using a combination of proteomic and expression array data, we identified a set of 36 genes with concordant dysregulation of protein products that could be evaluated in situ by quantitative immunohistochemistry. Another five prostate cancer biomarkers were included using linear discriminant analysis, we determined that the optimal model used to predict prostate cancer progression consisted of 12 proteins. Using a separate patient population, transcriptional levels of the 12 genes encoding for these proteins predicted prostate-specific antigen failure in 79 men following surgery for clinically localized prostate cancer (P = .0015). This study demonstrates that cross-platform models can lead to predictive models with the possible advantage of being more robust through this selection process. PMID:16533427
NASA Astrophysics Data System (ADS)
Guest, Will; Cashman, Neil; Plotkin, Steven
2009-03-01
Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and solution of the Poisson-Boltzmann equation. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.
Marsh, K L; Dixon, J; Dixon, M J
1998-10-01
Treacher Collins syndrome (TCS) is an autosomal dominant disorder of craniofacial development, the features of which include conductive hearing loss and cleft palate. The TCS gene ( TCOF1 ), which is localized to chromosome 5q32-q33.1, recently has been identified by positional cloning. Analysis of TCOF1 revealed that the majority of TCS mutations result in the creation of a premature termination codon. The function of the predicted protein, treacle, is unknown, although indirect evidence from database analyses suggests that it may function as a shuttling nucleolar phosphoprotein. In the current study, we provide the first direct evidence that treacle is a nucleolar protein. An antibody generated against treacle shows that it localizes to the nucleolus. Fusion proteins tagged to a green fluorescent protein reporter were shown to localize to different compartments of the cell when putative nuclear localization signals were deleted. Parallel experiments using conserved regions of the murine homologue of TCOF1 confirmed these results. Site-directed mutagenesis has been used to recreate mutations observed in individuals with TCS. The resulting truncated proteins are mislocalized within the cell, which further supports the hypothesis that an integral part of treacle's function involves shuttling between the nucleolus and the cytoplasm. TCS is, therefore, the first Mendelian disorder resulting from mutations which lead to aberrant expression of a nucleolar protein.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kobayashi, Yuko; Katanosaka, Yuki; Iwata, Yuko
2006-10-01
Spectrin repeat (SR)-containing proteins are important for regulation of integrity of biomembranes, not only the plasma membrane but also those of intracellular organelles, such as the Golgi, nucleus, endo/lysosomes, and synaptic vesicles. We identified a novel SR-containing protein, named GSRP-56 (Golgi-localized SR-containing protein-56), by a yeast two-hybrid method, using a member of the transient receptor potential channel family, TRPV2, as bait. GSRP-56 is an isoform derived from a giant SR-containing protein, Syne-1 (synaptic nuclear envelope protein-1, also referred to as Nesprin-1 or Enaptin), predicted to be produced by alternative splicing. Immunological analysis demonstrated that this isoform is a 56-kDa protein,more » which is localized predominantly in the Golgi apparatus in cardiomyocytes and C2C12 myoblasts/myotubes, and we found that two SR domains were required both for Golgi targeting and for interaction with TRPV2. Interestingly, overexpression of GSRP-56 resulted in a morphological change in the Golgi structure, characterized by its enlargement of cis-Golgi marker antibody-staining area, which would result partly from fragmentation of Golgi membranes. Our findings indicate that GSRP-56 is a novel, particularly small Golgi-localized member of the spectrin family, which possibly play a role in maintenance of the Golgi structure.« less
PatchSurfers: Two methods for local molecular property-based binding ligand prediction.
Shin, Woong-Hee; Bures, Mark Gregory; Kihara, Daisuke
2016-01-15
Protein function prediction is an active area of research in computational biology. Function prediction can help biologists make hypotheses for characterization of genes and help interpret biological assays, and thus is a productive area for collaboration between experimental and computational biologists. Among various function prediction methods, predicting binding ligand molecules for a target protein is an important class because ligand binding events for a protein are usually closely intertwined with the proteins' biological function, and also because predicted binding ligands can often be directly tested by biochemical assays. Binding ligand prediction methods can be classified into two types: those which are based on protein-protein (or pocket-pocket) comparison, and those that compare a target pocket directly to ligands. Recently, our group proposed two computational binding ligand prediction methods, Patch-Surfer, which is a pocket-pocket comparison method, and PL-PatchSurfer, which compares a pocket to ligand molecules. The two programs apply surface patch-based descriptions to calculate similarity or complementarity between molecules. A surface patch is characterized by physicochemical properties such as shape, hydrophobicity, and electrostatic potentials. These properties on the surface are represented using three-dimensional Zernike descriptors (3DZD), which are based on a series expansion of a 3 dimensional function. Utilizing 3DZD for describing the physicochemical properties has two main advantages: (1) rotational invariance and (2) fast comparison. Here, we introduce Patch-Surfer and PL-PatchSurfer with an emphasis on PL-PatchSurfer, which is more recently developed. Illustrative examples of PL-PatchSurfer performance on binding ligand prediction as well as virtual drug screening are also provided. Copyright © 2015 Elsevier Inc. All rights reserved.
Performance of protein-structure predictions with the physics-based UNRES force field in CASP11.
Krupa, Paweł; Mozolewska, Magdalena A; Wiśniewska, Marta; Yin, Yanping; He, Yi; Sieradzan, Adam K; Ganzynkowicz, Robert; Lipska, Agnieszka G; Karczyńska, Agnieszka; Ślusarz, Magdalena; Ślusarz, Rafał; Giełdoń, Artur; Czaplewski, Cezary; Jagieła, Dawid; Zaborowski, Bartłomiej; Scheraga, Harold A; Liwo, Adam
2016-11-01
Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulation process except for weak restraints on secondary structure to enable us to finish each prediction within the allowed 3-week time window. Because of simplified UNRES representation of polypeptide chains, use of enhanced sampling methods, code optimization and parallelization and sufficient computational resources, we were able to treat, for the first time, all 55 human prediction targets with sizes from 44 to 595 amino acid residues, the average size being 251 residues. Complete structures of six single-domain proteins were predicted accurately, with the highest accuracy being attained for the T0769, for which the CαRMSD was 3.8 Å for 97 residues of the experimental structure. Correct structures were also predicted for 13 domains of multi-domain proteins with accuracy comparable to that of the best template-based modeling methods. With further improvements of the UNRES force field that are now underway, our physics-based coarse-grained approach to protein-structure prediction will eventually reach global prediction capacity and, consequently, reliability in simulating protein structure and dynamics that are important in biochemical processes. Freely available on the web at http://www.unres.pl/ CONTACT: has5@cornell.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Generation of Rab-based transgenic lines for in vivo studies of endosome biology in zebrafish
Clark, Brian S.; Winter, Mark; Cohen, Andrew R.; Link, Brian A.
2011-01-01
The Rab family of small GTPases function as molecular switches regulating membrane and protein trafficking. Individual Rab isoforms define and are required for specific endosomal compartments. To facilitate in vivo investigation of specific Rab proteins, and endosome biology in general, we have generated transgenic zebrafish lines to mark and manipulate Rab proteins. We also developed software to track and quantify endosome dynamics within time-lapse movies. The established transgenic lines ubiquitously express EGFP fusions of Rab5c (early endosomes), Rab11a (recycling endosomes), and Rab7 (late endosomes) to study localization and dynamics during development. Additionally, we generated UAS-based transgenic lines expressing constitutive active (CA) and dominant negative (DN) versions for each of these Rab proteins. Predicted localization and functional consequences for each line were verified through a variety of assays, including lipophilic dye uptake and Crumbs2a localization. In summary, we have established a toolset for in vivo analyses of endosome dynamics and functions. PMID:21976318
Mapping the nuclear localization signal in the matrix protein of potato yellow dwarf virus.
Anderson, Gavin; Jang, Chanyong; Wang, Renyuan; Goodin, Michael
2018-05-01
The ability of the matrix (M) protein of potato yellow dwarf virus (PYDV) to remodel nuclear membranes is controlled by a di-leucine motif located at residues 223 and 224 of its primary structure. This function can be uncoupled from that of its nuclear localization signal (NLS), which is controlled primarily by lysine and arginine residues immediately downstream of the LL motif. In planta localization of green fluorescent protein fusions, bimolecular fluorescence complementation assays with nuclear import receptor importin-α1 and yeast-based nuclear import assays provided three independent experimental approaches to validate the authenticity of the M-NLS. The carboxy terminus of M is predicted to contain a nuclear export signal, which is belived to be functional, given the ability of M to bind the Arabidopsis nuclear export receptor 1 (XPO1). The nuclear shuttle activity of M has implications for the cell-to-cell movement of PYDV nucleocapsids, based upon its interaction with the N and Y proteins.
NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.
Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua
2013-01-01
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen
2017-09-10
Knowledge of subcellular locations of proteins is crucially important for in-depth understanding their functions in a cell. With the explosive growth of protein sequences generated in the postgenomic age, it is highly demanded to develop computational tools for timely annotating their subcellular locations based on the sequence information alone. The current study is focused on virus proteins. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex proteins is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mVirus" by extracting the optimal GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on a same stringent benchmark dataset indicated that the proposed pLoc-mVirus predictor is remarkably superior to iLoc-Virus, the state-of-the-art method in predicting virus protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mVirus/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Copyright © 2017 Elsevier B.V. All rights reserved.
Testing whether Metazoan Tyrosine Loss Was Driven by Selection against Promiscuous Phosphorylation
Pandya, Siddharth; Struck, Travis J.; Mannakee, Brian K.; Paniscus, Mary; Gutenkunst, Ryan N.
2015-01-01
Protein tyrosine phosphorylation is a key regulatory modification in metazoans, and the corresponding kinase enzymes have diversified dramatically. This diversification is correlated with a genome-wide reduction in protein tyrosine content, and it was recently suggested that this reduction was driven by selection to avoid promiscuous phosphorylation that might be deleterious. We tested three predictions of this intriguing hypothesis. 1) Selection should be stronger on residues that are more likely to be phosphorylated due to local solvent accessibility or structural disorder. 2) Selection should be stronger on proteins that are more likely to be promiscuously phosphorylated because they are abundant. We tested these predictions by comparing distributions of tyrosine within and among human and yeast orthologous proteins. 3) Selection should be stronger against mutations that create tyrosine versus remove tyrosine. We tested this prediction using human population genomic variation data. We found that all three predicted effects are modest for tyrosine when compared with the other amino acids, suggesting that selection against deleterious phosphorylation was not dominant in driving metazoan tyrosine loss. PMID:25312910
Wang, Shunfang; Liu, Shuhui
2015-12-19
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
Wang, Shunfang; Liu, Shuhui
2015-01-01
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574
Baculovirus LEF-11 nuclear localization signal is important for viral DNA replication.
Chen, Tingting; Dong, Zhanqi; Hu, Nan; Hu, Zhigang; Dong, Feifan; Jiang, Yaming; Li, Jun; Chen, Peng; Lu, Cheng; Pan, Minhui
2017-06-15
Baculovirus LEF-11 is a small nuclear protein that is involved in viral late gene transcription and DNA replication. However, the characteristics of its nuclear localization signal and its impact on viral DNA replication are unknown. In the present study, systemic bioinformatics analysis showed that the baculovirus LEF-11 contains monopartite and bipartite classical nuclear localization signal sequences (cNLSs), which were also detected in a few alphabaculovirus species. Localization of representative LEF-11 proteins of four baculovirus genera indicated that the nuclear localization characteristics of baculovirus LEF-11 coincided with the predicted results. Moreover, Bombyx mori nucleopolyhedrovirus (BmNPV) LEF-11 could be transported into the nucleus during viral infection in the absence of a cNLSs. Further investigations demonstrated that the NLS of BmNPV LEF-11 is important for viral DNA replication. The findings of the present study indicate that the characteristics of the baculovirus LEF-11 protein and the NLS is essential to virus DNA replication and nuclear transport mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander
2009-11-01
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Improved model quality assessment using ProQ2.
Ray, Arjun; Lindahl, Erik; Wallner, Björn
2012-09-10
Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement. Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local. ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson's correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.
Biological and functional relevance of CASP predictions
Liu, Tianyun; Ish‐Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D.
2017-01-01
Abstract Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo‐sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo‐sites), and Ten sites containing important motifs, loops, or key residues with important disease‐associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best‐ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand‐binding sites, most prediction methods have higher performance on apo‐sites than holo‐sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein‐protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein‐protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. PMID:28975675
Zhang, Hua; Kurgan, Lukasz
2014-12-01
Knowledge of protein flexibility is vital for deciphering the corresponding functional mechanisms. This knowledge would help, for instance, in improving computational drug design and refinement in homology-based modeling. We propose a new predictor of the residue flexibility, which is expressed by B-factors, from protein chains that use local (in the chain) predicted (or native) relative solvent accessibility (RSA) and custom-derived amino acid (AA) alphabets. Our predictor is implemented as a two-stage linear regression model that uses RSA-based space in a local sequence window in the first stage and a reduced AA pair-based space in the second stage as the inputs. This method is easy to comprehend explicit linear form in both stages. Particle swarm optimization was used to find an optimal reduced AA alphabet to simplify the input space and improve the prediction performance. The average correlation coefficients between the native and predicted B-factors measured on a large benchmark dataset are improved from 0.65 to 0.67 when using the native RSA values and from 0.55 to 0.57 when using the predicted RSA values. Blind tests that were performed on two independent datasets show consistent improvements in the average correlation coefficients by a modest value of 0.02 for both native and predicted RSA-based predictions.
Quantifying the Molecular Origins of Opposite Solvent Effects on Protein-Protein Interactions
Vagenende, Vincent; Han, Alvin X.; Pek, Han B.; Loo, Bernard L. W.
2013-01-01
Although the nature of solvent-protein interactions is generally weak and non-specific, addition of cosolvents such as denaturants and osmolytes strengthens protein-protein interactions for some proteins, whereas it weakens protein-protein interactions for others. This is exemplified by the puzzling observation that addition of glycerol oppositely affects the association constants of two antibodies, D1.3 and D44.1, with lysozyme. To resolve this conundrum, we develop a methodology based on the thermodynamic principles of preferential interaction theory and the quantitative characterization of local protein solvation from molecular dynamics simulations. We find that changes of preferential solvent interactions at the protein-protein interface quantitatively account for the opposite effects of glycerol on the antibody-antigen association constants. Detailed characterization of local protein solvation in the free and associated protein states reveals how opposite solvent effects on protein-protein interactions depend on the extent of dewetting of the protein-protein contact region and on structural changes that alter cooperative solvent-protein interactions at the periphery of the protein-protein interface. These results demonstrate the direct relationship between macroscopic solvent effects on protein-protein interactions and atom-scale solvent-protein interactions, and establish a general methodology for predicting and understanding solvent effects on protein-protein interactions in diverse biological environments. PMID:23696727
Quantifying the molecular origins of opposite solvent effects on protein-protein interactions.
Vagenende, Vincent; Han, Alvin X; Pek, Han B; Loo, Bernard L W
2013-01-01
Although the nature of solvent-protein interactions is generally weak and non-specific, addition of cosolvents such as denaturants and osmolytes strengthens protein-protein interactions for some proteins, whereas it weakens protein-protein interactions for others. This is exemplified by the puzzling observation that addition of glycerol oppositely affects the association constants of two antibodies, D1.3 and D44.1, with lysozyme. To resolve this conundrum, we develop a methodology based on the thermodynamic principles of preferential interaction theory and the quantitative characterization of local protein solvation from molecular dynamics simulations. We find that changes of preferential solvent interactions at the protein-protein interface quantitatively account for the opposite effects of glycerol on the antibody-antigen association constants. Detailed characterization of local protein solvation in the free and associated protein states reveals how opposite solvent effects on protein-protein interactions depend on the extent of dewetting of the protein-protein contact region and on structural changes that alter cooperative solvent-protein interactions at the periphery of the protein-protein interface. These results demonstrate the direct relationship between macroscopic solvent effects on protein-protein interactions and atom-scale solvent-protein interactions, and establish a general methodology for predicting and understanding solvent effects on protein-protein interactions in diverse biological environments.
Protein hydrogen exchange: Testing current models
Skinner, John J; Lim, Woon K; Bédard, Sabrina; Black, Ben E; Englander, S Walter
2012-01-01
To investigate the determinants of protein hydrogen exchange (HX), HX rates of most of the backbone amide hydrogens of Staphylococcal nuclease were measured by NMR methods. A modified analysis was used to improve accuracy for the faster hydrogens. HX rates of both near surface and well buried hydrogens are spread over more than 7 orders of magnitude. These results were compared with previous hypotheses for HX rate determination. Contrary to a common assumption, proximity to the surface of the native protein does not usually produce fast exchange. The slow HX rates for unprotected surface hydrogens are not well explained by local electrostatic field. The ability of buried hydrogens to exchange is not explained by a solvent penetration mechanism. The exchange rates of structurally protected hydrogens are not well predicted by algorithms that depend only on local interactions or only on transient unfolding reactions. These observations identify some of the present difficulties of HX rate prediction and suggest the need for returning to a detailed hydrogen by hydrogen analysis to examine the bases of structure-rate relationships, as described in the companion paper (Skinner et al., Protein Sci 2012;21:996–1005). PMID:22544567
Chou, Kuo-Chen; Shen, Hong-Bin
2007-05-01
One of the critical challenges in predicting protein subcellular localization is how to deal with the case of multiple location sites. Unfortunately, so far, no efforts have been made in this regard except for the one focused on the proteins in budding yeast only. For most existing predictors, the multiple-site proteins are either excluded from consideration or assumed even not existing. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. For instance, according to the Swiss-Prot database (version 50.7, released 19-Sept-2006), among the 33,925 eukaryotic protein entries that have experimentally observed subcellular location annotations, 2715 have multiple location sites, meaning about 8% bearing the multiplex feature. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. Meanwhile, according to the same Swiss-Prot database, the number of total eukaryotic protein entries (except those annotated with "fragment" or those with less than 50 amino acids) is 90,909, meaning a gap of (90,909-33,925) = 56,984 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the blank, so far, all the existing methods for predicting eukaryotic protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Euk-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Euk-mPLoc is freely accessible to the public as a Web server at http://202.120.37.186/bioinf/euk-multi. Meanwhile, to support the people working in the relevant areas, Euk-mPLoc has been used to identify all eukaryotic protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited at the same Web site via a downloadable file prepared with Microsoft Excel and named "Tab_Euk-mPLoc.xls". Furthermore, to include new entries of eukaryotic proteins and reflect the continuous development of Euk-mPLoc in both the coverage scope and prediction accuracy, we will timely update the downloadable file as well as the predictor, and keep users informed by publishing a short note in the Journal and making an announcement in the Web Page.
Frietze, Kathryn M.; Campos, Samuel K.; Kajon, Adriana E.
2010-01-01
Subspecies B1 human adenoviruses (HAdV-B1s) are important causative agents of acute respiratory disease, but the molecular bases of their distinct pathobiology are still poorly understood. Marked differences in genetic content between HAdV-B1s and the well-characterized HAdV-Cs that may contribute to distinct pathogenic properties map to the E3 region. Between the highly conserved E3-19K and E3-10.4K/RIDα open reading frames (ORFs), and in the same location as the HAdV-C ADP/E3-11.6K ORF, HAdV-B1s carry ORFs E3-20.1K and E3-20.5K and a polymorphic third ORF, designated E3-10.9K, that varies in the size of its predicted product among HAdV-B1 serotypes and genomic variants. As an initial effort to define the function of the E3-10.9K ORF, we carried out a biochemical characterization of E3-10.9K-encoded orthologous proteins and investigated their expression in infected cells. Sequence-based predictions suggested that E3-10.9K orthologs with a hydrophobic domain are integral membrane proteins. Ectopically expressed, C-terminally tagged (with enhanced green fluorescent protein [EGFP]) E3-10.9K and E3-9K localized primarily to the plasma membrane, while E3-7.7K localized primarily to a juxtanuclear compartment that could not be identified. EGFP fusion proteins with a hydrophobic domain were N and O glycosylated. EGFP-tagged E3-4.8K, which lacked the hydrophobic domain, displayed diffuse cellular localization similar to that of the EGFP control. E3-10.9K transcripts from the major late promoter were detected at late time points postinfection. A C-terminally hemagglutinin-tagged version of E3-9K was detected by immunoprecipitation at late times postinfection in the membrane fraction of mutant virus-infected cells. These data suggest a role for ORF E3-10.9K-encoded proteins at late stages of HAdV-B1 replication, with potentially important functional implications for the documented ORF polymorphism. PMID:20739542
1991-01-23
lidocaine , within the lumen of the pore. Specific predictions for possible experimental mutations are made which can serve to test both the proposed...to the protein from the bilayer interior. 2. Synthesis of tetrameric synthetic channel proteins and demonstration of channel blockade by a local...Congress, Vancouver, Canada. S9.4, p. 2 4 (1990). Grove, A., J. M. Tomich and M. Montal. Design, synthesis and single channel characterization of d
Plastid proteomics for elucidating iron limited remodeling of plastid physiology in diatoms
NASA Astrophysics Data System (ADS)
Gomes, K. M.; Nunn, B. L.; Jenkins, B. D.
2016-02-01
Diatoms are important primary producers in the world's oceans and their growth is constrained in large regions by low iron availability. This low iron-induced limitation of primary production is due to the requirement for iron in components of essential metabolic pathways including key chloroplast functions such as photosynthesis and nitrate assimilation. Diatoms can bloom and accumulate high biomass during introduction of iron into low iron waters, indicating adaptations allowing for their survival in iron-limited waters and rapid growth when iron becomes more abundant. Prior studies have shown that under iron limited stress, diatoms alter plastid-specific processes including components of electron transport, size of light harvesting capacity and chlorophyll content, suggesting plastid-specific protein regulation. Due to their complex evolutionary history, resulting from a secondary endosymbiosis, knowledge regarding the complement of plastid localized proteins remains limited in comparison to other model photosynthetic organisms. While in-silico prediction of diatom protein localization provides putative candidates for plastid-localization, these analyses can be limited as most plastid prediction models were developed using plants, primary endosymbionts. In order to characterize proteins enriched in diatom chloroplast and to understand how the plastid proteome is remodeled in response to iron limitation, we used mass spectrometry based proteomics to compare plastid- enriched protein fractions from Thalassiosira pseudonana, grown in iron replete and limited conditions. These analyses show that iron stress alters regulation of major metabolic pathways in the plastid including the Calvin cycle and fatty acid synthesis. These components provide promising targets to further characterize the plastid specific response to iron limitation.
Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang
2011-01-01
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036
SLLE for predicting membrane protein types.
Wang, Meng; Yang, Jie; Xu, Zhi-Jie; Chou, Kuo-Chen
2005-01-07
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.
Vlahovicek, K; Munteanu, M G; Pongor, S
1999-01-01
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
Effective Potentials for Folding Proteins
NASA Astrophysics Data System (ADS)
Chen, Nan-Yow; Su, Zheng-Yao; Mou, Chung-Yu
2006-02-01
A coarse-grained off-lattice model that is not biased in any way to the native state is proposed to fold proteins. To predict the native structure in a reasonable time, the model has included the essential effects of water in an effective potential. Two new ingredients, the dipole-dipole interaction and the local hydrophobic interaction, are introduced and are shown to be as crucial as the hydrogen bonding. The model allows successful folding of the wild-type sequence of protein G and may have provided important hints to the study of protein folding.
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
NASA Astrophysics Data System (ADS)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei
2014-06-01
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N^2). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei, E-mail: wei@math.msu.edu
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions,more » while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N{sup 2}). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.« less
Tropini, Carolina; Huang, Kerwyn Casey
2012-01-01
Bacterial cells maintain sophisticated levels of intracellular organization that allow for signal amplification, response to stimuli, cell division, and many other critical processes. The mechanisms underlying localization and their contribution to fitness have been difficult to uncover, due to the often challenging task of creating mutants with systematically perturbed localization but normal enzymatic activity, and the lack of quantitative models through which to interpret subtle phenotypic changes. Focusing on the model bacterium Caulobacter crescentus, which generates two different types of daughter cells from an underlying asymmetric distribution of protein phosphorylation, we use mathematical modeling to investigate the contribution of the localization of histidine kinases to the establishment of cellular asymmetry and subsequent developmental outcomes. We use existing mutant phenotypes and fluorescence data to parameterize a reaction-diffusion model of the kinases PleC and DivJ and their cognate response regulator DivK. We then present a systematic computational analysis of the effects of changes in protein localization and abundance to determine whether PleC localization is required for correct developmental timing in Caulobacter. Our model predicts the developmental phenotypes of several localization mutants, and suggests that a novel strain with co-localization of PleC and DivJ could provide quantitative insight into the signaling threshold required for flagellar pole development. Our analysis indicates that normal development can be maintained through a wide range of localization phenotypes, and that developmental defects due to changes in PleC localization can be rescued by increased PleC expression. We also show that the system is remarkably robust to perturbation of the kinetic parameters, and while the localization of either PleC or DivJ is required for asymmetric development, the delocalization of one of these two components does not prevent flagellar pole development. We further find that allosteric regulation of PleC observed in vitro does not affect the predicted in vivo developmental phenotypes. Taken together, our model suggests that cells can tolerate perturbations to localization phenotypes, whose evolutionary origins may be connected with reducing protein expression or with decoupling pre- and post-division phenotypes. PMID:22876167
Shen, Hong-Bin; Chou, Kuo-Chen
2007-02-15
Viruses can reproduce their progenies only within a host cell, and their actions depend both on its destructive tendencies toward a specific host cell and on environmental conditions. Therefore, knowledge of the subcellular localization of viral proteins in a host cell or virus-infected cell is very useful for in-depth studying of their functions and mechanisms as well as designing antiviral drugs. An analysis on the Swiss-Prot database (version 50.0, released on May 30, 2006) indicates that only 23.5% of viral protein entries are annotated for their subcellular locations in this regard. As for the gene ontology database, the corresponding percentage is 23.8%. Such a gap calls for the development of high throughput tools for timely annotating the localization of viral proteins within host and virus-infected cells. In this article, a predictor called "Virus-PLoc" has been developed that is featured by fusing many basic classifiers with each engineered according to the K-nearest neighbor rule. The overall jackknife success rate obtained by Virus-PLoc in identifying the subcellular compartments of viral proteins was 80% for a benchmark dataset in which none of proteins has more than 25% sequence identity to any other in a same location site. Virus-PLoc will be freely available as a web-server at http://202.120.37.186/bioinf/virus for the public usage. Furthermore, Virus-PLoc has been used to provide large-scale predictions of all viral protein entries in Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Virus-PLoc.xls." This file is available at the same website and will be updated twice a year to include the new entries of viral proteins and reflect the continuous development of Virus-PLoc. 2006 Wiley Periodicals, Inc.
A Bayesian network approach for modeling local failure in lung cancer
NASA Astrophysics Data System (ADS)
Oh, Jung Hun; Craft, Jeffrey; Lozi, Rawan Al; Vaidya, Manushka; Meng, Yifan; Deasy, Joseph O.; Bradley, Jeffrey D.; El Naqa, Issam
2011-03-01
Locally advanced non-small cell lung cancer (NSCLC) patients suffer from a high local failure rate following radiotherapy. Despite many efforts to develop new dose-volume models for early detection of tumor local failure, there was no reported significant improvement in their application prospectively. Based on recent studies of biomarker proteins' role in hypoxia and inflammation in predicting tumor response to radiotherapy, we hypothesize that combining physical and biological factors with a suitable framework could improve the overall prediction. To test this hypothesis, we propose a graphical Bayesian network framework for predicting local failure in lung cancer. The proposed approach was tested using two different datasets of locally advanced NSCLC patients treated with radiotherapy. The first dataset was collected retrospectively, which comprises clinical and dosimetric variables only. The second dataset was collected prospectively in which in addition to clinical and dosimetric information, blood was drawn from the patients at various time points to extract candidate biomarkers as well. Our preliminary results show that the proposed method can be used as an efficient method to develop predictive models of local failure in these patients and to interpret relationships among the different variables in the models. We also demonstrate the potential use of heterogeneous physical and biological variables to improve the model prediction. With the first dataset, we achieved better performance compared with competing Bayesian-based classifiers. With the second dataset, the combined model had a slightly higher performance compared to individual physical and biological models, with the biological variables making the largest contribution. Our preliminary results highlight the potential of the proposed integrated approach for predicting post-radiotherapy local failure in NSCLC patients.
Learning cellular sorting pathways using protein interactions and sequence motifs.
Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F
2011-11-01
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
Zhang, Yi; Nikolovski, Nino; Sorieul, Mathias; Vellosillo, Tamara; McFarlane, Heather E.; Dupree, Ray; Kesten, Christopher; Schneider, René; Driemeier, Carlos; Lathe, Rahul; Lampugnani, Edwin; Yu, Xiaolan; Ivakov, Alexander; Doblin, Monika S.; Mortimer, Jenny C.; Brown, Steven P.; Persson, Staffan; Dupree, Paul
2016-01-01
As the most abundant biopolymer on Earth, cellulose is a key structural component of the plant cell wall. Cellulose is produced at the plasma membrane by cellulose synthase (CesA) complexes (CSCs), which are assembled in the endomembrane system and trafficked to the plasma membrane. While several proteins that affect CesA activity have been identified, components that regulate CSC assembly and trafficking remain unknown. Here we show that STELLO1 and 2 are Golgi-localized proteins that can interact with CesAs and control cellulose quantity. In the absence of STELLO function, the spatial distribution within the Golgi, secretion and activity of the CSCs are impaired indicating a central role of the STELLO proteins in CSC assembly. Point mutations in the predicted catalytic domains of the STELLO proteins indicate that they are glycosyltransferases facing the Golgi lumen. Hence, we have uncovered proteins that regulate CSC assembly in the plant Golgi apparatus. PMID:27277162
Accounting for Protein Subcellular Localization: A Compartmental Map of the Rat Liver Proteome*
Jadot, Michel; Boonen, Marielle; Thirion, Jaqueline; Wang, Nan; Xing, Jinchuan; Zhao, Caifeng; Tannous, Abla; Qian, Meiqian; Zheng, Haiyan; Everett, John K.; Moore, Dirk F.; Sleat, David E.; Lobel, Peter
2017-01-01
Accurate knowledge of the intracellular location of proteins is important for numerous areas of biomedical research including assessing fidelity of putative protein-protein interactions, modeling cellular processes at a system-wide level and investigating metabolic and disease pathways. Many proteins have not been localized, or have been incompletely localized, partly because most studies do not account for entire subcellular distribution. Thus, proteins are frequently assigned to one organelle whereas a significant fraction may reside elsewhere. As a step toward a comprehensive cellular map, we used subcellular fractionation with classic balance sheet analysis and isobaric labeling/quantitative mass spectrometry to assign locations to >6000 rat liver proteins. We provide quantitative data and error estimates describing the distribution of each protein among the eight major cellular compartments: nucleus, mitochondria, lysosomes, peroxisomes, endoplasmic reticulum, Golgi, plasma membrane and cytosol. Accounting for total intracellular distribution improves quality of organelle assignments and assigns proteins with multiple locations. Protein assignments and supporting data are available online through the Prolocate website (http://prolocate.cabm.rutgers.edu). As an example of the utility of this data set, we have used organelle assignments to help analyze whole exome sequencing data from an infant dying at 6 months of age from a suspected neurodegenerative lysosomal storage disorder of unknown etiology. Sequencing data was prioritized using lists of lysosomal proteins comprising well-established residents of this organelle as well as novel candidates identified in this study. The latter included copper transporter 1, encoded by SLC31A1, which we localized to both the plasma membrane and lysosome. The patient harbors two predicted loss of function mutations in SLC31A1, suggesting that this may represent a heretofore undescribed recessive lysosomal storage disease gene. PMID:27923875
Accounting for Protein Subcellular Localization: A Compartmental Map of the Rat Liver Proteome.
Jadot, Michel; Boonen, Marielle; Thirion, Jaqueline; Wang, Nan; Xing, Jinchuan; Zhao, Caifeng; Tannous, Abla; Qian, Meiqian; Zheng, Haiyan; Everett, John K; Moore, Dirk F; Sleat, David E; Lobel, Peter
2017-02-01
Accurate knowledge of the intracellular location of proteins is important for numerous areas of biomedical research including assessing fidelity of putative protein-protein interactions, modeling cellular processes at a system-wide level and investigating metabolic and disease pathways. Many proteins have not been localized, or have been incompletely localized, partly because most studies do not account for entire subcellular distribution. Thus, proteins are frequently assigned to one organelle whereas a significant fraction may reside elsewhere. As a step toward a comprehensive cellular map, we used subcellular fractionation with classic balance sheet analysis and isobaric labeling/quantitative mass spectrometry to assign locations to >6000 rat liver proteins. We provide quantitative data and error estimates describing the distribution of each protein among the eight major cellular compartments: nucleus, mitochondria, lysosomes, peroxisomes, endoplasmic reticulum, Golgi, plasma membrane and cytosol. Accounting for total intracellular distribution improves quality of organelle assignments and assigns proteins with multiple locations. Protein assignments and supporting data are available online through the Prolocate website (http://prolocate.cabm.rutgers.edu). As an example of the utility of this data set, we have used organelle assignments to help analyze whole exome sequencing data from an infant dying at 6 months of age from a suspected neurodegenerative lysosomal storage disorder of unknown etiology. Sequencing data was prioritized using lists of lysosomal proteins comprising well-established residents of this organelle as well as novel candidates identified in this study. The latter included copper transporter 1, encoded by SLC31A1, which we localized to both the plasma membrane and lysosome. The patient harbors two predicted loss of function mutations in SLC31A1, suggesting that this may represent a heretofore undescribed recessive lysosomal storage disease gene. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen
2013-04-05
Predicting protein subcellular localization is a challenging problem, particularly when query proteins have multi-label features meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multi-label proteins should not be ignored because they usually bear some special function worthy of in-depth studies. By introducing the "multi-label learning" approach, a new predictor, called iLoc-Animal, has been developed that can be used to deal with the systems containing both single- and multi-label animal (metazoan except human) proteins. Meanwhile, to measure the prediction quality of a multi-label system in a rigorous way, five indices were introduced; they are "Absolute-True", "Absolute-False" (or Hamming-Loss"), "Accuracy", "Precision", and "Recall". As a demonstration, the jackknife cross-validation was performed with iLoc-Animal on a benchmark dataset of animal proteins classified into the following 20 location sites: (1) acrosome, (2) cell membrane, (3) centriole, (4) centrosome, (5) cell cortex, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracellular, (11) Golgi apparatus, (12) lysosome, (13) mitochondrion, (14) melanosome, (15) microsome, (16) nucleus, (17) peroxisome, (18) plasma membrane, (19) spindle, and (20) synapse, where many proteins belong to two or more locations. For such a complicated system, the outcomes achieved by iLoc-Animal for all the aforementioned five indices were quite encouraging, indicating that the predictor may become a useful tool in this area. It has not escaped our notice that the multi-label approach and the rigorous measurement metrics can also be used to investigate many other multi-label problems in molecular biology. As a user-friendly web-server, iLoc-Animal is freely accessible to the public at the web-site .
Tran, Hoang T.; Pappu, Rohit V.
2006-01-01
Our focus is on an appropriate theoretical framework for describing highly denatured proteins. In high concentrations of denaturants, proteins behave like polymers in a good solvent and ensembles for denatured proteins can be modeled by ignoring all interactions except excluded volume (EV) effects. To assay conformational preferences of highly denatured proteins, we quantify a variety of properties for EV-limit ensembles of 23 two-state proteins. We find that modeled denatured proteins can be best described as follows. Average shapes are consistent with prolate ellipsoids. Ensembles are characterized by large correlated fluctuations. Sequence-specific conformational preferences are restricted to local length scales that span five to nine residues. Beyond local length scales, chain properties follow well-defined power laws that are expected for generic polymers in the EV limit. The average available volume is filled inefficiently, and cavities of all sizes are found within the interiors of denatured proteins. All properties characterized from simulated ensembles match predictions from rigorous field theories. We use our results to resolve between conflicting proposals for structure in ensembles for highly denatured states. PMID:16766618
Street, Timothy O; Barrick, Doug
2009-01-01
The Notch ankyrin domain is a repeat protein whose folding has been characterized through equilibrium and kinetic measurements. In previous work, equilibrium folding free energies of truncated constructs were used to generate an experimentally determined folding energy landscape (Mello and Barrick, Proc Natl Acad Sci USA 2004;101:14102–14107). Here, this folding energy landscape is used to parameterize a kinetic model in which local transition probabilities between partly folded states are based on energy values from the landscape. The landscape-based model correctly predicts highly diverse experimentally determined folding kinetics of the Notch ankyrin domain and sequence variants. These predictions include monophasic folding and biphasic unfolding, curvature in the unfolding limb of the chevron plot, population of a transient unfolding intermediate, relative folding rates of 19 variants spanning three orders of magnitude, and a change in the folding pathway that results from C-terminal stabilization. These findings indicate that the folding pathway(s) of the Notch ankyrin domain are thermodynamically selected: the primary determinants of kinetic behavior can be simply deduced from the local stability of individual repeats. PMID:19177351
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal
2015-01-01
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Prediction and analysis of structure, stability and unfolding of thermolysin-like proteases
NASA Astrophysics Data System (ADS)
Vriend, Gert; Eijsink, Vincent
1993-08-01
Bacillus neutral proteases (NPs) form a group of well-characterized homologous enzymes, that exhibit large differences in thermostability. The three-dimensional (3D) structures of several of these enzymes have been modelled on the basis of the crystal structures of the NPs of B. thermoproteolyticus (thermolysin) and B. cercus. Several new techniques have been developed to improve the model-building procedures. Also a model-building by mutagenesis' strategy was used, in which mutants were designed just to shed light on parts of the structures that were particularly hard to model. The NP models have been used for the prediction of site-directed mutations aimed at improving the thermostability of the enzymes. Predictions were made using several novel computational techniques, such as position-specific rotamer searching, packing quality analysis and property-profile database searches. Many stabilizing mutations were predicted and produced: improvement of hydrogen bonding, exclusion of buried water molecules, capping helices, improvement of hydrophobic interactions and entropic stabilization have been applied successfully. At elevated temperatures NPs are irreversibly inactivated as a result of autolysis. It has been shown that this denaturation process is independent of the protease activity and concentration and that the inactivation follows first-order kinetics. From this it has been conjectured that local unfolding of (surface) loops, which renders the protein susceptible to autolysis, is the rate-limiting step. Despite the particular nature of the thermal denaturation process, normal rules for protein stability can be applied to NPs. However, rather than stabilizing the whole protein against global unfolding, only a small region has to be protected against local unfolding. In contrast to proteins in general, mutational effects in proteases are not additive and their magnitude is strongly dependent on the location of the mutation. Mutations that alter the stability of the NP by a large amount are located in a relatively weak region (or more precisely, they affect a local unfolding pathway with a relatively low free energy of activation). One weak region, that is supposedly important in the early steps of NP unfolding, has been determined in the NP of B. stearothermophilus. After eliminating this weakest link a drastic increase in thermostability was observed and the search for the second-weakest link, or the second-lowest energy local unfolding pathway is now in progress. Hopefully, this approach can be used to unravel the entire early phase of unfolding.
Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Li, Li-Ping; Huang, De-Shuang; Yan, Gui-Ying; Nie, Ru; Huang, Yu-An
2017-04-04
Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Protein functional features are reflected in the patterns of mRNA translation speed.
López, Daniel; Pazos, Florencio
2015-07-09
The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
He, Xiaocui; Zhang, Yang; Yu, Ziniu
2010-10-01
Rieske protein gene in the Pacific oyster Crassostrea gigas was obtained by in silico cloning for the first time, and its expression profiles and subcellular localization were determined, respectively. The full-length cDNA of Cgisp is 985 bp in length and contains a 5'- and 3'-untranslated regions of 35 and 161 bp, respectively, with an open reading frame of 786 bp encoding a protein of 262 amino acids. The predicted molecular weight of 30 kDa of Cgisp protein was verified by prokaryotic expression. Conserved Rieske [2Fe-2S] cluster binding sites and highly matched-pair tertiary structure with 3CWB_E (Gallus gallus) were revealed by homologous analysis and molecular modeling. Eleven putative SNP sites and two conserved hexapeptide sequences, box I (THLGC) and II (PCHGS), were detected by multiple alignments. Real-time PCR analysis showed that Cgisp is expressed in a wide range of tissues, with adductor muscle exhibiting the top expression level, suggesting its biological function of energy transduction. The GFP tagging Cgisp indicated a mitochondrial localization, further confirming its physiological function.
CURVATURE-DRIVEN MOLECULAR FLOW ON MEMBRANE SURFACE*
MIKUCKI, MICHAEL; ZHOU, Y. C.
2017-01-01
This work presents a mathematical model for the localization of multiple species of diffusion molecules on membrane surfaces. Morphological change of bilayer membrane in vivo is generally modulated by proteins. Most of these modulations are associated with the localization of related proteins in the crowded lipid environments. We start with the energetic description of the distributions of molecules on curved membrane surface, and define the spontaneous curvature of bilayer membrane as a function of the molecule concentrations on membrane surfaces. A drift-diffusion equation governs the gradient flow of the surface molecule concentrations. We recast the energetic formulation and the related governing equations by using an Eulerian phase field description to define membrane morphology. Computational simulations with the proposed mathematical model and related numerical techniques predict (i) the molecular localization on static membrane surfaces at locations with preferred mean curvatures, and (ii) the generation of preferred mean curvature which in turn drives the molecular localization. PMID:29056778
2005-01-01
proteomic gel analyses. The research group has explored the use of chemodescriptors calculated using high-level ab initio quantum chemical basis sets...descriptors that characterize the entire proteomics map, local descriptors that characterize a subset of the proteins present in the gel, and spectrum...techniques for analyzing the full set of proteins present in a proteomics map. 14. SUBJECT TERMS 1S. NUMBER OF PAGES Topological indices
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang
2018-03-10
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Wei, Tong; Chen, Tsung-Chi; Ho, Yuen Ting; Ronald, Pamela C
2016-01-01
The rice receptor kinase XA21 confers robust resistance to the bacterial pathogen Xanthomonas oryzae pv. oryzae ( Xoo ). We previously reported that XA21 is cleaved in transgenic plants overexpressing XA21 with a GFP tag ( Ubi -XA21-GFP) and that the released C-terminal domain is localized to the nucleus. XA21 carries a predicted nuclear localization sequence (NLS) that directs the C-terminal domain to the nucleus in transient assays, whereas alanine substitutions in the NLS disrupt the nuclear localization. To determine if the predicted NLS is required for XA21-mediated immunity in planta , we generated transgenic plants overexpressing an XA21 variant carrying the NLS with the same alanine substitutions ( Ubi -XA21nls-GFP). Ubi- XA21nls-GFP plants displayed slightly longer lesion lengths, higher Xoo bacterial populations after inoculation and lower levels of reactive oxygen species production compared with the Ubi- XA21-GFP control plants. However, the Ubi- XA21nls-GFP plants express lower levels of protein than that observed in Ubi- XA21-GFP. These results demonstrate that the predicted NLS is not required for XA21-mediated immunity.
Acylation-dependent protein export in Leishmania.
Denny, P W; Gokool, S; Russell, D G; Field, M C; Smith, D F
2000-04-14
The surface of the protozoan parasite Leishmania is unusual in that it consists predominantly of glycosylphosphatidylinositol-anchored glycoconjugates and proteins. Additionally, a family of hydrophilic acylated surface proteins (HASPs) has been localized to the extracellular face of the plasma membrane in infective parasite stages. These surface polypeptides lack a recognizable endoplasmic reticulum secretory signal sequence, transmembrane spanning domain, or glycosylphosphatidylinositol-anchor consensus sequence, indicating that novel mechanisms are involved in their transport and localization. Here, we show that the N-terminal domain of HASPB contains primary structural information that directs both N-myristoylation and palmitoylation and is essential for correct localization of the protein to the plasma membrane. Furthermore, the N-terminal 18 amino acids of HASPB, encoding the dual acylation site, are sufficient to target the heterologous Aequorea victoria green fluorescent protein to the cell surface of Leishmania. Mutagenesis of the predicted acylated residues confirms that modification by both myristate and palmitate is required for correct trafficking. These data suggest that HASPB is a representative of a novel class of proteins whose translocation onto the surface of eukaryotic cells is dependent upon a "non-classical" pathway involving N-myristoylation/palmitoylation. Significantly, HASPB is also translocated on to the extracellular face of the plasma membrane of transfected mammalian cells, indicating that the export signal for HASPB is recognized by a higher eukaryotic export mechanism.
Baker, Steven Andrew; Lombardi, Laura Marie; Zoghbi, Huda Yahya
2015-09-11
Methyl-CpG binding protein 2 (MeCP2) is a nuclear protein with important roles in regulating chromatin structure and gene expression, and mutations in MECP2 cause Rett syndrome (RTT). Within the MeCP2 protein sequence, the nuclear localization signal (NLS) is reported to reside between amino acids 255-271, and certain RTT-causing mutations overlap with the MeCP2 NLS, suggesting that they may alter nuclear localization. One such mutation, R270X, is predicted to interfere with the localization of MeCP2, but recent in vivo studies have demonstrated that this mutant remains entirely nuclear. To clarify the mechanism of MeCP2 nuclear import, we isolated proteins that interact with the NLS and identified karyopherin α 3 (KPNA3 or Kap-α3) and karyopherin α 4 (KPNA4 or Kap-α4) as key binding partners of MeCP2. MeCP2-R270X did not interact with KPNA4, consistent with a requirement for an intact NLS in this interaction. However, this mutant retains binding to KPNA3, accounting for the normal localization of MeCP2-R270X to the nucleus. These data provide a mechanism for MeCP2 nuclear import and have implications for the design of therapeutics aimed at modulating the function of MeCP2 in RTT patients. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
2009-01-01
Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426
Rehman, Zia Ur; Idris, Adnan; Khan, Asifullah
2018-06-01
Protein-Protein Interactions (PPI) play a vital role in cellular processes and are formed because of thousands of interactions among proteins. Advancements in proteomics technologies have resulted in huge PPI datasets that need to be systematically analyzed. Protein complexes are the locally dense regions in PPI networks, which extend important role in metabolic pathways and gene regulation. In this work, a novel two-phase protein complex detection and grouping mechanism is proposed. In the first phase, topological and biological features are extracted for each complex, and prediction performance is investigated using Bagging based Ensemble classifier (PCD-BEns). Performance evaluation through cross validation shows improvement in comparison to CDIP, MCode, CFinder and PLSMC methods Second phase employs Multi-Dimensional Scaling (MDS) for the grouping of known complexes by exploring inter complex relations. It is experimentally observed that the combination of topological and biological features in the proposed approach has greatly enhanced prediction performance for protein complex detection, which may help to understand various biological processes, whereas application of MDS based exploration may assist in grouping potentially similar complexes. Copyright © 2018 Elsevier Ltd. All rights reserved.
TSKS concentrates in spermatid centrioles during flagellogenesis.
Xu, Bingfang; Hao, Zhonglin; Jha, Kula N; Zhang, Zhibing; Urekar, Craig; Digilio, Laura; Pulido, Silvia; Strauss, Jerome F; Flickinger, Charles J; Herr, John C
2008-07-15
Centrosomal coiled-coil proteins paired with kinases play critical roles in centrosomal functions within somatic cells, however knowledge regarding gamete centriolar proteins is limited. In this study, the substrate of TSSK1 and 2, TSKS, was localized during spermiogenesis to the centrioles of post-meiotic spermatids, where it reached its greatest concentration during the period of flagellogenesis. This centriolar localization persisted in ejaculated human spermatozoa, while centriolar TSKS diminished in mouse sperm, where centrioles are known to undergo complete degeneration. In addition to the centriolar localization during flagellogenesis, mouse TSKS and the TSSK2 kinase localized in the tail and acrosomal regions of mouse epididymal sperm, while TSSK2 was found in the equatorial segment, neck and the midpiece of human spermatozoa. TSSK2/TSKS is the first kinase/substrate pair localized to the centrioles of spermatids and spermatozoa. Coupled with the infertility due to haploinsufficiency noted in chimeric mice with deletion of Tssk1 and 2 (companion paper) this centriolar kinase/substrate pair is predicted to play an indispensable role during spermiogenesis.
Päiväniemi, Outi E.; Maasilta, Paula K.; Vainikka, Tiina L. S.; Alho, Hanni S.; Karhunen, Pekka J.; Salminen, Ulla-Stina
2009-01-01
The local immunoreactivity of C-reactive protein (CRP) was studied in a heterotopic porcine model of posttranplant obliterative bronchiolitis (OB). Bronchial allografts and control autografts were examined serially 2–28 days after subcutaneous transplantation. The autografts stayed patent. In the allografts, proliferation of inflammatory cells (P < .0001) and fibroblasts (P = .02) resulted in occlusion of the bronchial lumens (P < .01). Influx of CD4+ (P < .001) and CD8+ (P < .0001) cells demonstrated allograft immune response. CRP positivity simultaneously increased in the bronchial walls (P < .01), in macrophages, myofibroblasts, and endothelial cells. Local CRP was predictive of features characteristic of OB (R = 0.456–0.879, P < .05−P < .0001). Early obliterative lesions also showed CRP positivity, but not mature, collagen-rich obliterative plugs (P < .05). During OB development, CRP is localized in inflammatory cells, myofibroblasts and endothelial cells probably as a part of the local inflammatory response. PMID:19503785
Elastic network model of learned maintained contacts to predict protein motion
Putz, Ines
2017-01-01
We present a novel elastic network model, lmcENM, to determine protein motion even for localized functional motions that involve substantial changes in the protein’s contact topology. Existing elastic network models assume that the contact topology remains unchanged throughout the motion and are thus most appropriate to simulate highly collective function-related movements. lmcENM uses machine learning to differentiate breaking from maintained contacts. We show that lmcENM accurately captures functional transitions unexplained by the classical ENM and three reference ENM variants, while preserving the simplicity of classical ENM. We demonstrate the effectiveness of our approach on a large set of proteins covering different motion types. Our results suggest that accurately predicting a “deformation-invariant” contact topology offers a promising route to increase the general applicability of ENMs. We also find that to correctly predict this contact topology a combination of several features seems to be relevant which may vary slightly depending on the protein. Additionally, we present case studies of two biologically interesting systems, Ferric Citrate membrane transporter FecA and Arachidonate 15-Lipoxygenase. PMID:28854238
Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes
Joseph, Agnel Praveen; Valadié, Hélène; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.
2012-01-01
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions. PMID:22745680
Wilson, Robert L.; Frisz, Jessica F.; Hanafin, William P.; Carpenter, Kevin J.; Hutcheon, Ian D.; Weber, Peter K.; Kraft, Mary L.
2014-01-01
The local abundance of specific lipid species near a membrane protein is hypothesized to influence the protein’s activity. The ability to simultaneously image the distributions of specific protein and lipid species in the cell membrane would facilitate testing these hypotheses. Recent advances in imaging the distribution of cell membrane lipids with mass spectrometry have created the desire for membrane protein probes that can be simultaneously imaged with isotope labeled lipids. Such probes would enable conclusive tests of whether specific proteins co-localize with particular lipid species. Here, we describe the development of fluorine-functionalized colloidal gold immunolabels that facilitate the detection and imaging of specific proteins in parallel with lipids in the plasma membrane using high-resolution SIMS performed with a NanoSIMS. First, we developed a method to functionalize colloidal gold nanoparticles with a partially fluorinated mixed monolayer that permitted NanoSIMS detection and rendered the functionalized nanoparticles dispersible in aqueous buffer. Then, to allow for selective protein labeling, we attached the fluorinated colloidal gold nanoparticles to the nonbinding portion of antibodies. By combining these functionalized immunolabels with metabolic incorporation of stable isotopes, we demonstrate that influenza hemagglutinin and cellular lipids can be imaged in parallel using NanoSIMS. These labels enable a general approach to simultaneously imaging specific proteins and lipids with high sensitivity and lateral resolution, which may be used to evaluate predictions of protein co-localization with specific lipid species. PMID:22284327
Chauvin, Anaïs; Wang, Chang-Shu; Geha, Sameh; Garde-Granger, Perrine; Mathieu, Alex-Ane; Lacasse, Vincent; Boisvert, François-Michel
2018-01-01
Colorectal cancer is the third most common and the fourth most lethal cancer in the world. In the majority of cases, patients are diagnosed at an advanced stage or even metastatic, thus explaining the high mortality. The standard treatment for patients with locally advanced non-metastatic rectal cancer is neoadjuvant radio-chemotherapy (NRCT) with 5-fluorouracil (5-FU) followed by surgery, but the resistance rate to this treatment remains high with approximately 30% of non-responders. The lack of evidence available in clinical practice to predict NRCT resistance to 5-FU and to guide clinical practice therefore encourages the search for biomarkers of this resistance. From twenty-three formalin-fixed paraffin-embedded (FFPE) biopsies performed before NRCT with 5-FU of locally advanced non-metastatic rectal cancer patients, we extracted and analysed the tumor proteome of these patients. From clinical data, we were able to classify the twenty-three patients in our cohort into three treatment response groups: non-responders (NR), partial responders (PR) and total responders (TR), and to compare the proteomes of these different groups. We have highlighted 384 differentially abundant proteins between NR and PR, 248 between NR and TR and 417 between PR and TR. Among these proteins, we have identified many differentially abundant proteins identified as having a role in cancer (IFIT1, FASTKD2, PIP4K2B, ARID1B, SLC25A33: overexpressed in TR; CALD1, CPA3, B3GALT5, CD177, RIPK1: overexpressed in NR). We have also identified that DPYD, the main degradation enzyme of 5-FU, was overexpressed in NR, as well as several ribosomal and mitochondrial proteins also overexpressed in NR. Data are available via ProteomeXchange with identifier PXD008440. From these retrospective study, we implemented a protein extraction protocol from FFPE biopsy to highlight protein differences between different response groups to RCTN with 5-FU in patients with locally advanced non-metastatic rectal cancer. These results will pave the way for a larger cohort for better sensitivity and specificity of the signature to guide decisions in the choice of treatment.
Ko, Junsu; Park, Hahnbeom; Seok, Chaok
2012-08-10
Protein structures can be reliably predicted by template-based modeling (TBM) when experimental structures of homologous proteins are available. However, it is challenging to obtain structures more accurate than the single best templates by either combining information from multiple templates or by modeling regions that vary among templates or are not covered by any templates. We introduce GalaxyTBM, a new TBM method in which the more reliable core region is modeled first from multiple templates and less reliable, variable local regions, such as loops or termini, are then detected and re-modeled by an ab initio method. This TBM method is based on "Seok-server," which was tested in CASP9 and assessed to be amongst the top TBM servers. The accuracy of the initial core modeling is enhanced by focusing on more conserved regions in the multiple-template selection and multiple sequence alignment stages. Additional improvement is achieved by ab initio modeling of up to 3 unreliable local regions in the fixed framework of the core structure. Overall, GalaxyTBM reproduced the performance of Seok-server, with GalaxyTBM and Seok-server resulting in average GDT-TS of 68.1 and 68.4, respectively, when tested on 68 single-domain CASP9 TBM targets. For application to multi-domain proteins, GalaxyTBM must be combined with domain-splitting methods. Application of GalaxyTBM to CASP9 targets demonstrates that accurate protein structure prediction is possible by use of a multiple-template-based approach, and ab initio modeling of variable regions can further enhance the model quality.
Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs
Lin, Tien-Ho; Bar-Joseph, Ziv
2011-01-01
Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284
An, Ji-Yong; Zhang, Lei; Zhou, Yong; Zhao, Yu-Jun; Wang, Da-Fu
2017-08-18
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2016-09-01
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Wise, C A; Chiang, L C; Paznekas, W A; Sharma, M; Musy, M M; Ashley, J A; Lovett, M; Jabs, E W
1997-04-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
Wise, Carol A.; Chiang, Lydia C.; Paznekas, William A.; Sharma, Mridula; Musy, Maurice M.; Ashley, Jennifer A.; Lovett, Michael; Jabs, Ethylin W.
1997-01-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development. PMID:9096354
Identification of a new protein in the centrosome-like "atractophore" of Trichomonas vaginalis.
Bricheux, Geneviève; Coffe, Gérard; Brugerolle, Guy
2007-06-01
The human parasite Trichomonas vaginalis has specific structural bodies, atractophores, associated at one end to the kinetosomes and at the other to the spindle during division. A monoclonal antibody specific for a component of this structure was obtained. It recognizes a protein with a predicted molecular mass of 477 kDa. Sequence analysis of this protein shows that P477 belongs to the family of large coiled-coil proteins, sharing a highly versatile protein folding motif adaptable to many biological functions. P477-might act as an anchor to localize cellular activities and components to the golgi centrosomal region. It may represent a new class of structural proteins, since similar proteins were found in many protozoans.
Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini
2015-01-01
The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; < 5%). Detailed analysis of the genes predicted to encode proteins of the abH08 superfamily revealed a high proportion related to epoxide hydrolases and haloalkane dehalogenases in polluted mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. PMID:25171437
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2015-01-01
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. PMID:26369671
Scrg1, a novel protein of the CNS is targeted to the large dense-core vesicles in neuronal cells.
Dandoy-Dron, Françoise; Griffond, Bernadette; Mishal, Zohar; Tovey, Michael G; Dron, Michel
2003-11-01
Scrapie responsive gene one (Scrg1) is a novel transcript discovered through identification of the genes associated with or responsible for the neurodegenerative changes observed in transmissible spongiform encephalopathies. Scrg1 mRNA is distributed principally in the central nervous system and the cDNA sequence predicts a small cysteine-rich protein 98 amino acids in length, with a N-terminal signal peptide. In this study, we have generated antibodies against the predicted protein and revealed expression of a predominant immunoreactive protein of 10 kDa in mouse brain by Western blot analysis. We have established CAD neuronal cell lines stably expressing Scrg1 to determine its subcellular localization. Several lines of evidence show that the protein is targeted to dense-core vesicles in these cells. (i) Scrg1 is detected by immunocytochemistry as very punctate signals especially in the Golgi apparatus and tips of neurites, suggesting a vesicular localization for the protein. Moreover, Scrg1 exhibits a high degree of colocalization with secretogranin II, a dense-core vesicle marker and a very limited colocalization with markers for small synaptic vesicles. (ii) Scrg1 immunoreactivity is associated with large secretory granules/dense-core vesicles, as indicated by immuno-electron microscopy. (iii) Scrg1 is enriched in fractions of sucrose density gradient where synaptotagmin V, a dense-core vesicle-associated protein, is also enriched. The characteristic punctate immunostaining of Scrg1 is observed in N2A cells transfected with Scrg1 and for the endogenous protein in cultured primary neurons, attesting to the generality of the observations. Our findings strongly suggest that Scrg1 is associated with the secretory pathway of neuronal cells.
Eggink, Laura L; LoBrutto, Russell; Brune, Daniel C; Brusslan, Judy; Yamasato, Akihiro; Tanaka, Ayumi; Hoober, J Kenneth
2004-01-01
Background Assembly of stable light-harvesting complexes (LHCs) in the chloroplast of green algae and plants requires synthesis of chlorophyll (Chl) b, a reaction that involves oxygenation of the 7-methyl group of Chl a to a formyl group. This reaction uses molecular oxygen and is catalyzed by chlorophyllide a oxygenase (CAO). The amino acid sequence of CAO predicts mononuclear iron and Rieske iron-sulfur centers in the protein. The mechanism of synthesis of Chl b and localization of this reaction in the chloroplast are essential steps toward understanding LHC assembly. Results Fluorescence of a CAO-GFP fusion protein, transiently expressed in young pea leaves, was found at the periphery of mature chloroplasts and on thylakoid membranes by confocal fluorescence microscopy. However, when membranes from partially degreened cells of Chlamydomonas reinhardtii cw15 were resolved on sucrose gradients, full-length CAO was detected by immunoblot analysis only on the chloroplast envelope inner membrane. The electron paramagnetic resonance spectrum of CAO included a resonance at g = 4.3, assigned to the predicted mononuclear iron center. Instead of a spectrum of the predicted Rieske iron-sulfur center, a nearly symmetrical, approximately 100 Gauss peak-to-trough signal was observed at g = 2.057, with a sensitivity to temperature characteristic of an iron-sulfur center. A remarkably stable radical in the protein was revealed by an isotropic, 9 Gauss peak-to-trough signal at g = 2.0042. Fragmentation of the protein after incorporation of 125I- identified a conserved tyrosine residue (Tyr-422 in Chlamydomonas and Tyr-518 in Arabidopsis) as the radical species. The radical was quenched by chlorophyll a, an indication that it may be involved in the enzymatic reaction. Conclusion CAO was found on the chloroplast envelope and thylakoid membranes in mature chloroplasts but only on the envelope inner membrane in dark-grown C. reinhardtii cells. Such localization provides further support for the envelope membranes as the initial site of Chl b synthesis and assembly of LHCs during chloroplast development. Identification of a tyrosine radical in the protein provides insight into the mechanism of Chl b synthesis. PMID:15086960
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Roslyn N.; Sanford, James A.; Park, Jea H.
Towards developing a systems-level pathobiological understanding of Salmonella enterica, we performed a subcellular proteomic analysis of this pathogen grown under standard laboratory and infection-mimicking conditions in vitro. Analysis of proteins from cytoplasmic, inner membrane, periplasmic, and outer membrane fractions yielded coverage of over 30% of the theoretical proteome. Confident subcellular location could be assigned to over 1000 proteins, with good agreement between experimentally observed location and predicted/known protein properties. Comparison of protein location under the different environmental conditions provided insight into dynamic protein localization and possible moonlighting (multiple function) activities. Notable examples of dynamic localization were the response regulators ofmore » two-component regulatory systems (e.g., ArcB, PhoQ). The DNA-binding protein Dps that is generally regarded as cytoplasmic was significantly enriched in the outer membrane for all growth conditions examined, suggestive of moonlighting activities. These observations imply the existence of unknown transport mechanisms and novel functions for a subset of Salmonella proteins. Overall, this work provides a catalog of experimentally verified subcellular protein location for Salmonella and a framework for further investigations using computational modeling.« less
Brohi, Rahim Dad; Wang, Li; Hassine, Najla Ben; Cao, Jing; Talpur, Hira Sajjad; Wu, Di; Huang, Chun-Jie; Rehman, Zia-Ur; Bhattarai, Dinesh; Huo, Li-Jun
2017-01-01
Mature spermatozoa have highly condensed DNA that is essentially silent both transcriptionally and translationally. Therefore, post translational modifications are very important for regulating sperm motility, morphology, and for male fertility in general. Protein sumoylation was recently demonstrated in human and rodent spermatozoa, with potential consequences for sperm motility and DNA integrity. We examined the expression and localization of small ubiquitin-related modifier-1 (SUMO-1) in the sperm of water buffalo (Bubalus bubalis) using immunofluorescence analysis. We confirmed the expression of SUMO-1 in the acrosome. We further found that SUMO-1 was lost if the acrosome reaction was induced by calcium ionophore A23187. Proteins modified or conjugated by SUMO-1 in water buffalo sperm were pulled down and analyzed by mass spectrometry. Sixty proteins were identified, including proteins important for sperm morphology and motility, such as relaxin receptors and cytoskeletal proteins, including tubulin chains, actins, and dyneins. Forty-six proteins were predicted as potential sumoylation targets. The expression of SUMO-1 in the acrosome region of water buffalo sperm and the identification of potentially SUMOylated proteins important for sperm function implicates sumoylation as a crucial PTM related to sperm function. PMID:28659810
Zlopasa, Livija; Brachner, Andreas; Foisner, Roland
2016-06-01
Ankyrin repeats and LEM domain containing protein 1 (Ankle1) belongs to the LEM protein family, whose members share a chromatin-interacting LEM motif. Unlike most other LEM proteins, Ankle1 is not an integral protein of the inner nuclear membrane but shuttles between the nucleus and the cytoplasm. It contains a GIY-YIG-type nuclease domain, but its function is unknown. The mammalian genome encodes only one other GIY-YIG domain protein, termed Slx1. Slx1 has been described as a resolvase that processes Holliday junctions during homologous recombination-mediated DNA double strand break repair. Resolvase activity is regulated in a spatial and temporal manner during the cell cycle. We hypothesized that Ankle1 may have a similar function and its nucleo-cytoplasmic shuttling may contribute to the regulation of Ankle1 activity. Hence, we aimed at identifying the domains mediating Ankle1 shuttling and investigating whether cellular localization is affected during DNA damage response. Sequence analysis predicts the presence of two canonical nuclear import and export signals in Ankle1. Immunofluorescence microscopy of cells expressing wild-type and various mutated Ankle1-fusion proteins revealed a C-terminally located classical monopartite nuclear localization signal and a centrally located CRM1-dependent nuclear export signal that mediate nucleo-cytoplasmic shuttling of Ankle1. These sequences are also functional in heterologous proteins. The predominant localization of Ankle1 in the cytoplasm, however, does not change upon induction of several DNA damage response pathways throughout the cell cycle. We identified the domains mediating nuclear import and export of Ankle1. Ankle1's cellular localization was not affected following DNA damage.
Rose, Annkatrin; Manikantan, Sankaraganesh; Schraegle, Shannon J.; Maloy, Michael A.; Stahlberg, Eric A.; Meier, Iris
2004-01-01
Increasing evidence demonstrates the importance of long coiled-coil proteins for the spatial organization of cellular processes. Although several protein classes with long coiled-coil domains have been studied in animals and yeast, our knowledge about plant long coiled-coil proteins is very limited. The repeat nature of the coiled-coil sequence motif often prevents the simple identification of homologs of animal coiled-coil proteins by generic sequence similarity searches. As a consequence, counterparts of many animal proteins with long coiled-coil domains, like lamins, golgins, or microtubule organization center components, have not been identified yet in plants. Here, all Arabidopsis proteins predicted to contain long stretches of coiled-coil domains were identified by applying the algorithm MultiCoil to a genome-wide screen. A searchable protein database, ARABI-COIL (http://www.coiled-coil.org/arabidopsis), was established that integrates information on number, size, and position of predicted coiled-coil domains with subcellular localization signals, transmembrane domains, and available functional annotations. ARABI-COIL serves as a tool to sort and browse Arabidopsis long coiled-coil proteins to facilitate the identification and selection of candidate proteins of potential interest for specific research areas. Using the database, candidate proteins were identified for Arabidopsis membrane-bound, nuclear, and organellar long coiled-coil proteins. PMID:15020757
GPS-ARM: Computational Analysis of the APC/C Recognition Motif by Predicting D-Boxes and KEN-Boxes
Ren, Jian; Cao, Jun; Zhou, Yanhong; Yang, Qing; Xue, Yu
2012-01-01
Anaphase-promoting complex/cyclosome (APC/C), an E3 ubiquitin ligase incorporated with Cdh1 and/or Cdc20 recognizes and interacts with specific substrates, and faithfully orchestrates the proper cell cycle events by targeting proteins for proteasomal degradation. Experimental identification of APC/C substrates is largely dependent on the discovery of APC/C recognition motifs, e.g., the D-box and KEN-box. Although a number of either stringent or loosely defined motifs proposed, these motif patterns are only of limited use due to their insufficient powers of prediction. We report the development of a novel GPS-ARM software package which is useful for the prediction of D-boxes and KEN-boxes in proteins. Using experimentally identified D-boxes and KEN-boxes as the training data sets, a previously developed GPS (Group-based Prediction System) algorithm was adopted. By extensive evaluation and comparison, the GPS-ARM performance was found to be much better than the one using simple motifs. With this powerful tool, we predicted 4,841 potential D-boxes in 3,832 proteins and 1,632 potential KEN-boxes in 1,403 proteins from H. sapiens, while further statistical analysis suggested that both the D-box and KEN-box proteins are involved in a broad spectrum of biological processes beyond the cell cycle. In addition, with the co-localization information, we predicted hundreds of mitosis-specific APC/C substrates with high confidence. As the first computational tool for the prediction of APC/C-mediated degradation, GPS-ARM is a useful tool for information to be used in further experimental investigations. The GPS-ARM is freely accessible for academic researchers at: http://arm.biocuckoo.org. PMID:22479614
ZO proteins redundantly regulate the transcription factor DbpA/ZONAB.
Spadaro, Domenica; Tapia, Rocio; Jond, Lionel; Sudol, Marius; Fanning, Alan S; Citi, Sandra
2014-08-08
The localization and activities of DbpA/ZONAB and YAP transcription factors are in part regulated by the density-dependent assembly of epithelial junctions. DbpA activity and cell proliferation are inhibited by exogenous overexpression of the tight junction (TJ) protein ZO-1, leading to a model whereby ZO-1 acts by sequestering DbpA at the TJ. However, mammary epithelial cells and mouse tissues knock-out for ZO-1 do not show increased proliferation, as predicted by this model. To address this discrepancy, we examined the localization and activity of DbpA and YAP in Madin-Darby canine kidney cells depleted either of ZO-1, or one of the related proteins ZO-2 and ZO-3 (ZO proteins), or all three together. Depletion of only one ZO protein had no effect on DbpA localization and activity, whereas depletion of ZO-1 and ZO-2, which is associated with reduced ZO-3 expression, resulted in increased DbpA localization in the cytoplasm. Only depletion of ZO-2 reduced the nuclear import of YAP. Mammary epithelial (Eph4) cells KO for ZO-1 showed junctional DbpA, demonstrating that ZO-1 is not required to sequester DbpA at junctions. However, further depletion of ZO-2 in Eph4 ZO-1KO cells, which do not express ZO-3, caused decreased junctional localization and expression of DbpA, which were rescued by the proteasome inhibitor MG132. In vitro binding assays showed that full-length ZO-1 does not interact with DbpA. These results show that ZO-2 is implicated in regulating the nuclear shuttling of YAP, whereas ZO proteins redundantly control the junctional retention and stability of DbpA, without affecting its shuttling to the nucleus. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Improved hybrid optimization algorithm for 3D protein structure prediction.
Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang
2014-07-01
A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
Robust prediction of protein subcellular localization combining PCA and WSVMs.
Tian, Jiang; Gu, Hong; Liu, Wenqi; Gao, Chiyang
2011-08-01
Automated prediction of protein subcellular localization is an important tool for genome annotation and drug discovery, and Support Vector Machines (SVMs) can effectively solve this problem in a supervised manner. However, the datasets obtained from real experiments are likely to contain outliers or noises, which can lead to poor generalization ability and classification accuracy. To explore this problem, we adopt strategies to lower the effect of outliers. First we design a method based on Weighted SVMs, different weights are assigned to different data points, so the training algorithm will learn the decision boundary according to the relative importance of the data points. Second we analyse the influence of Principal Component Analysis (PCA) on WSVM classification, propose a hybrid classifier combining merits of both PCA and WSVM. After performing dimension reduction operations on the datasets, kernel-based possibilistic c-means algorithm can generate more suitable weights for the training, as PCA transforms the data into a new coordinate system with largest variances affected greatly by the outliers. Experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of prediction accuracy. Copyright © 2011 Elsevier Ltd. All rights reserved.
Localization of MRP-1 to the outer mitochondrial membrane by the chaperone protein HSP90β.
Roundhill, Elizabeth; Turnbull, Doug; Burchill, Susan
2016-05-01
Overexpression of plasma membrane multidrug resistance-associated protein 1 (MRP-1) in Ewing's sarcoma (ES) predicts poor outcome. MRP-1 is also expressed in mitochondria, and we have examined the submitochondrial localization of MRP-1 and investigated the mechanism of MRP-1 transport and role of this organelle in the response to doxorubicin. The mitochondrial localization of MRP-1 was examined in ES cell lines by differential centrifugation and membrane solubilization by digitonin. Whether MRP-1 is chaperoned by heat shock proteins (HSPs) was investigated by immunoprecipitation, immunofluorescence microscopy, and HSP knockout using small hairpin RNA and inhibitors (apoptozole, 17-AAG, and NVPAUY). The effect of disrupting mitochondrial MRP-1-dependent efflux activity on the cytotoxic effect of doxorubicin was investigated by counting viable cell number. Mitochondrial MRP-1 is glycosylated and localized to the outer mitochondrial membrane, where it is coexpressed with HSP90. MRP-1 binds to both HSP90 and HSP70, although only inhibition of HSP90β decreases expression of MRP-1 in the mitochondria. Disruption of mitochondrial MRP-1-dependent efflux significantly increases the cytotoxic effect of doxorubicin (combination index, <0.9). For the first time, we have demonstrated that mitochondrial MRP-1 is expressed in the outer mitochondrial membrane and is a client protein of HSP90β, where it may play a role in the doxorubicin-induced resistance of ES.-Roundhill, E., Turnbull, D., Burchill, S. Localization of MRP-1 to the outer mitochondrial membrane by the chaperone protein HSP90β. © FASEB.
Kang, WonKyung; Imai, Noriko; Kawasaki, Yu; Nagamine, Toshihiro; Matsumoto, Shogo
2005-11-01
The Bombyx mori nucleopolyhedrovirus (BmNPV) ORF8 protein has previously been reported to colocalize with IE1 to specific nuclear sites during infection. Transient expression of green fluorescent protein (GFP)-fused ORF8 showed the protein to have cytoplasmic localization, but following BmNPV infection the protein formed foci, suggesting that ORF8 requires some other viral factor(s) for this. Therefore, interacting factors were looked for using the yeast two-hybrid system and IE1 was identified. We mapped the interacting region of ORF8 using a yeast two-hybrid assay. An N-terminal region (residues 1-110) containing a predicted coiled-coil domain interacted with IE1, while a truncated N-terminal region (residues 1-78) that lacks this domain did not. In addition, a protein with a complete deletion of the N-terminal region failed to interact with IE1. These results suggest that the ORF8 N-terminal region containing the coiled-coil domain is required for the interaction with IE1. Next, whether IE1 plays a role in ORF8 localization was investigated. In the presence of IE1, GFP-ORF8 localized to the nucleus. In addition, cotransfection with a plasmid expressing IE1 and a plasmid containing the hr3 element resulted in nuclear foci formation. A GFP-fused ORF8 mutant protein containing the coiled-coil domain, previously shown to interact with IE1, also formed nuclear foci in the presence of IE1 and hr3. However, ORF8 mutant proteins that did not interact with IE1 failed to form nuclear foci. In contrast to wild-type IE1, focus formation was not observed for an IE1 mutant protein that was deficient in hr binding. These results suggest that IE1 and hr facilitate the localization of BmNPV ORF8 to specific nuclear sites.
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes.
Comeau, Stephen R; Gatchell, David W; Vajda, Sandor; Camacho, Carlos J
2004-01-01
Predicting protein interactions is one of the most challenging problems in functional genomics. Given two proteins known to interact, current docking methods evaluate billions of docked conformations by simple scoring functions, and in addition to near-native structures yield many false positives, i.e. structures with good surface complementarity but far from the native. We have developed a fast algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. The free energy filters select complexes with lowest desolvation and electrostatic energies. Clustering is then used to smooth the local minima and to select the ones with the broadest energy wells-a property associated with the free energy at the binding site. The robustness of the method was tested on sets of 2000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 A from the native structure. The docking and discrimination method also provides good results for a number of complexes that were used as targets in the Critical Assessment of PRedictions of Interactions experiment. The fully automated docking and discrimination server ClusPro can be found at http://structure.bu.edu
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences.
Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu
2016-12-22
Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org.
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences
Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu
2016-01-01
Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org. PMID:28004786
Characterization of the Expression of the Petunia Glycine-Rich Protein-1 Gene Product 1
Condit, Carol M.; McLean, B. Gail; Meagher, Richard B.
1990-01-01
We have examined the expression of the petunia (Petunia hybrida) glycine-rich protein-1 (ptGRP1) gene product using an antibody raised against a synthetic peptide comprising amino acids 22 through 36 of the mature ptGRP1 protein. This antibody recognizes a single protein of 23 kilodaltons. Cell fractionation studies showed that, as predicted (CM Condit, RB Meagher [1986] Nature 323: 178-181), ptGRP1 is most likely localized in the cell wall. In addition, it was found that (extractable) ptGRP1 is present in much higher abundance in unexpanded than in fully expanded tissue, with highest levels of accumulation in the bud. This same developmentally regulated pattern of protein expression was found in all varieties of petunia tested. In addition, tissue blots of petunia stem sections showed that ptGRP1 is localized to within the vascular tissue (to at least the phloem or cambium) and to either the epidermal cells or to a layer of collenchyma cells directly below the epidermis. Localization of ptGRP1 antigen in these cell types is shown to occur at different times in the overall development of the plant and at different quantitative levels. Images Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 PMID:16667509
Improving compound-protein interaction prediction by building up highly credible negative samples.
Liu, Hui; Sun, Jianjiang; Guan, Jihong; Zheng, Jie; Zhou, Shuigeng
2015-06-15
Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases. Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. © The Author 2015. Published by Oxford University Press.
Wang, Shunfang; Nie, Bing; Yue, Kun; Fei, Yu; Li, Wenjia; Xu, Dongshu
2017-12-15
Kernel discriminant analysis (KDA) is a dimension reduction and classification algorithm based on nonlinear kernel trick, which can be novelly used to treat high-dimensional and complex biological data before undergoing classification processes such as protein subcellular localization. Kernel parameters make a great impact on the performance of the KDA model. Specifically, for KDA with the popular Gaussian kernel, to select the scale parameter is still a challenging problem. Thus, this paper introduces the KDA method and proposes a new method for Gaussian kernel parameter selection depending on the fact that the differences between reconstruction errors of edge normal samples and those of interior normal samples should be maximized for certain suitable kernel parameters. Experiments with various standard data sets of protein subcellular localization show that the overall accuracy of protein classification prediction with KDA is much higher than that without KDA. Meanwhile, the kernel parameter of KDA has a great impact on the efficiency, and the proposed method can produce an optimum parameter, which makes the new algorithm not only perform as effectively as the traditional ones, but also reduce the computational time and thus improve efficiency.
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.
Nagy, Alinda; Hegyi, Hédi; Farkas, Krisztina; Tordai, Hedvig; Kozma, Evelin; Bányai, László; Patthy, László
2008-08-27
Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.
Paluh, Janet L.; Nogales, Eva; Oakley, Berl R.; McDonald, Kent; Pidoux, Alison L.; Cande, W. Z.
2000-01-01
Mitotic segregation of chromosomes requires spindle pole functions for microtubule nucleation, minus end organization, and regulation of dynamics. γ-Tubulin is essential for nucleation, and we now extend its role to these latter processes. We have characterized a mutation in γ-tubulin that results in cold-sensitive mitotic arrest with an elongated bipolar spindle but impaired anaphase A. At 30°C cytoplasmic microtubule arrays are abnormal and bundle into single larger arrays. Three-dimensional time-lapse video microscopy reveals that microtubule dynamics are altered. Localization of the mutant γ-tubulin is like the wild-type protein. Prediction of γ-tubulin structure indicates that non-α/β-tubulin protein–protein interactions could be affected. The kinesin-like protein (klp) Pkl1p localizes to the spindle poles and spindle and is essential for viability of the γ-tubulin mutant and in multicopy for normal cell morphology at 30°C. Localization and function of Pkl1p in the mutant appear unaltered, consistent with a redundant function for this protein in wild type. Our data indicate a broader role for γ-tubulin at spindle poles in regulating aspects of microtubule dynamics and organization. We propose that Pkl1p rescues an impaired function of γ-tubulin that involves non-tubulin protein–protein interactions, presumably with a second motor, MAP, or MTOC component. PMID:10749926
Improta, Roberto; Vitagliano, Luigi; Esposito, Luciana
2015-11-01
The elucidation of the mutual influence between peptide bond geometry and local conformation has important implications for protein structure refinement, validation, and prediction. To gain insights into the structural determinants and the energetic contributions associated with protein/peptide backbone plasticity, we here report an extensive analysis of the variability of the peptide bond angles by combining statistical analyses of protein structures and quantum mechanics calculations on small model peptide systems. Our analyses demonstrate that all the backbone bond angles strongly depend on the peptide conformation and unveil the existence of regular trends as function of ψ and/or φ. The excellent agreement of the quantum mechanics calculations with the statistical surveys of protein structures validates the computational scheme here employed and demonstrates that the valence geometry of protein/peptide backbone is primarily dictated by local interactions. Notably, for the first time we show that the position of the H(α) hydrogen atom, which is an important parameter in NMR structural studies, is also dependent on the local conformation. Most of the trends observed may be satisfactorily explained by invoking steric repulsive interactions; in some specific cases the valence bond variability is also influenced by hydrogen-bond like interactions. Moreover, we can provide a reliable estimate of the energies involved in the interplay between geometry and conformations. © 2015 Wiley Periodicals, Inc.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
Wei, Tong; Chen, Tsung-Chi; Ho, Yuen Ting; ...
2016-10-05
Background: The rice receptor kinase XA21 confers robust resistance to the bacterial pathogen Xanthomonas oryzae pv. oryzae( Xoo). We previously reported that XA21 is cleaved in transgenic plants overexpressing XA21 with a GFP tag ( Ubi-XA21-GFP) and that the released C-terminal domain is localized to the nucleus. XA21 carries a predicted nuclear localization sequence (NLS) that directs the C-terminal domain to the nucleus in transient assays, whereas alanine substitutions in the NLS disrupt the nuclear localization. Methods: To determine if the predicted NLS is required for XA21-mediated immunity in planta, we generated transgenic plants overexpressing an XA21 variant carrying themore » NLS with the same alanine substitutions ( Ubi-XA21nls-GFP). Results: Ubi- XA21nls-GFP plants displayed slightly longer lesion lengths, higher Xoo bacterial populations after inoculation and lower levels of reactive oxygen species production compared with the Ubi- XA21-GFP control plants. However, the Ubi- XA21nls-GFP plants express lower levels of protein than that observed in Ubi- XA21-GFP. Discussion: These results demonstrate that the predicted NLS is not required for XA21-mediated immunity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, Tong; Chen, Tsung-Chi; Ho, Yuen Ting
Background: The rice receptor kinase XA21 confers robust resistance to the bacterial pathogen Xanthomonas oryzae pv. oryzae( Xoo). We previously reported that XA21 is cleaved in transgenic plants overexpressing XA21 with a GFP tag ( Ubi-XA21-GFP) and that the released C-terminal domain is localized to the nucleus. XA21 carries a predicted nuclear localization sequence (NLS) that directs the C-terminal domain to the nucleus in transient assays, whereas alanine substitutions in the NLS disrupt the nuclear localization. Methods: To determine if the predicted NLS is required for XA21-mediated immunity in planta, we generated transgenic plants overexpressing an XA21 variant carrying themore » NLS with the same alanine substitutions ( Ubi-XA21nls-GFP). Results: Ubi- XA21nls-GFP plants displayed slightly longer lesion lengths, higher Xoo bacterial populations after inoculation and lower levels of reactive oxygen species production compared with the Ubi- XA21-GFP control plants. However, the Ubi- XA21nls-GFP plants express lower levels of protein than that observed in Ubi- XA21-GFP. Discussion: These results demonstrate that the predicted NLS is not required for XA21-mediated immunity.« less
Guise, Amanda J.; Cristea, Ileana M.
2017-01-01
As a member of the class IIa family of histone deacetylases, the histone deacetylase 5 (HDAC5) is known to undergo nuclear–cytoplasmic shuttling and to be a critical transcriptional regulator. Its misregulation has been linked to prominent human diseases, including cardiac diseases and tumorigenesis. In this chapter, we describe several experimental methods that have proven effective for studying the functions and regulatory features of HDAC5. We present methods for assessing the subcellular localization, protein interactions, posttranslational modifications (PTMs), and activity of HDAC5 from the standpoint of investigating either the endogenous protein or tagged protein forms in human cells. Specifically, given that at the heart of HDAC5 regulation lie its dynamic localization, interactions, and PTMs, we present methods for assessing HDAC5 localization in fixed and live cells, for isolating HDAC5-containing protein complexes to identify its interactions and modifications, and for determining how these PTMs map to predicted HDAC5 structural motifs. Lastly, we provide examples of approaches for studying HDAC5 functions with a focus on its regulation during cell-cycle progression. These methods can readily be adapted for the study of other HDACs or non-HDAC-proteins of interest. Individually, these techniques capture temporal and spatial snapshots of HDAC5 functions; yet together, these approaches provide powerful tools for investigating both the regulation and regulatory roles of HDAC5 in different cell contexts relevant to health and disease. PMID:27246208
Miller, Edward B.; Murrett, Colleen S.; Zhu, Kai; Zhao, Suwen; Goldfeld, Dahlia A.; Bylund, Joseph H.; Friesner, Richard A.
2013-01-01
Robust homology modeling to atomic-level accuracy requires in the general case successful prediction of protein loops containing small segments of secondary structure. Further, as loop prediction advances to success with larger loops, the exclusion of loops containing secondary structure becomes awkward. Here, we extend the applicability of the Protein Local Optimization Program (PLOP) to loops up to 17 residues in length that contain either helical or hairpin segments. In general, PLOP hierarchically samples conformational space and ranks candidate loops with a high-quality molecular mechanics force field. For loops identified to possess α-helical segments, we employ an alternative dihedral library composed of (ϕ,ψ) angles commonly found in helices. The alternative library is searched over a user-specified range of residues that define the helical bounds. The source of these helical bounds can be from popular secondary structure prediction software or from analysis of past loop predictions where a propensity to form a helix is observed. Due to the maturity of our energy model, the lowest energy loop across all experiments can be selected with an accuracy of sub-Ångström RMSD in 80% of cases, 1.0 to 1.5 Å RMSD in 14% of cases, and poorer than 1.5 Å RMSD in 6% of cases. The effectiveness of our current methods in predicting hairpin-containing loops is explored with hairpins up to 13 residues in length and again reaching an accuracy of sub-Ångström RMSD in 83% of cases, 1.0 to 1.5 Å RMSD in 10% of cases, and poorer than 1.5 Å RMSD in 7% of cases. Finally, we explore the effect of an imprecise surrounding environment, in which side chains, but not the backbone, are initially in perturbed geometries. In these cases, loops perturbed to 3Å RMSD from the native environment were restored to their native conformation with sub-Ångström RMSD. PMID:23814507
Hu, Bingjie; Zhu, Xiaolei; Monroe, Lyman; Bures, Mark G; Kihara, Daisuke
2014-08-27
Structure-based computational methods have been widely used in exploring protein-ligand interactions, including predicting the binding ligands of a given protein based on their structural complementarity. Compared to other protein and ligand representations, the advantages of a surface representation include reduced sensitivity to subtle changes in the pocket and ligand conformation and fast search speed. Here we developed a novel method named PL-PatchSurfer (Protein-Ligand PatchSurfer). PL-PatchSurfer represents the protein binding pocket and the ligand molecular surface as a combination of segmented surface patches. Each patch is characterized by its geometrical shape and the electrostatic potential, which are represented using the 3D Zernike descriptor (3DZD). We first tested PL-PatchSurfer on binding ligand prediction and found it outperformed the pocket-similarity based ligand prediction program. We then optimized the search algorithm of PL-PatchSurfer using the PDBbind dataset. Finally, we explored the utility of applying PL-PatchSurfer to a larger and more diverse dataset and showed that PL-PatchSurfer was able to provide a high early enrichment for most of the targets. To the best of our knowledge, PL-PatchSurfer is the first surface patch-based method that treats ligand complementarity at protein binding sites. We believe that using a surface patch approach to better understand protein-ligand interactions has the potential to significantly enhance the design of new ligands for a wide array of drug-targets.
Hu, Bingjie; Zhu, Xiaolei; Monroe, Lyman; Bures, Mark G.; Kihara, Daisuke
2014-01-01
Structure-based computational methods have been widely used in exploring protein-ligand interactions, including predicting the binding ligands of a given protein based on their structural complementarity. Compared to other protein and ligand representations, the advantages of a surface representation include reduced sensitivity to subtle changes in the pocket and ligand conformation and fast search speed. Here we developed a novel method named PL-PatchSurfer (Protein-Ligand PatchSurfer). PL-PatchSurfer represents the protein binding pocket and the ligand molecular surface as a combination of segmented surface patches. Each patch is characterized by its geometrical shape and the electrostatic potential, which are represented using the 3D Zernike descriptor (3DZD). We first tested PL-PatchSurfer on binding ligand prediction and found it outperformed the pocket-similarity based ligand prediction program. We then optimized the search algorithm of PL-PatchSurfer using the PDBbind dataset. Finally, we explored the utility of applying PL-PatchSurfer to a larger and more diverse dataset and showed that PL-PatchSurfer was able to provide a high early enrichment for most of the targets. To the best of our knowledge, PL-PatchSurfer is the first surface patch-based method that treats ligand complementarity at protein binding sites. We believe that using a surface patch approach to better understand protein-ligand interactions has the potential to significantly enhance the design of new ligands for a wide array of drug-targets. PMID:25167137
Paciorkowski, Alex R; Weisenberg, Judy; Kelley, Joshua B; Spencer, Adam; Tuttle, Emily; Ghoneim, Dalia; Thio, Liu Lin; Christian, Susan L; Dobyns, William B; Paschal, Bryce M
2014-05-01
Nuclear import receptors of the KPNA family recognize the nuclear localization signal in proteins and together with importin-β mediate translocation into the nucleus. Accordingly, KPNA family members have a highly conserved architecture with domains that contact the nuclear localization signal and bind to importin-β. Here, we describe autosomal recessive mutations in KPNA7 found by whole exome sequencing in a sibling pair with severe developmental disability, infantile spasms, subsequent intractable epilepsy consistent with Lennox-Gastaut syndrome, partial agenesis of the corpus callosum, and cerebellar vermis hypoplasia. The mutations mapped to exon 7 in KPNA7 result in two amino-acid substitutions, Pro339Ala and Glu344Gln. On the basis of the crystal structure of the paralog KPNA2 bound to a bipartite nuclear localization signal from the retinoblastoma protein, the amino-acid substitutions in the affected subjects were predicted to occur within the seventh armadillo repeat that forms one of the two nuclear localization signal-binding sites in KPNA family members. Glu344 is conserved in all seven KPNA proteins, and we found that the Glu354Gln mutation in KPNA2 is sufficient to reduce binding to the retinoblastoma nuclear localization signal to approximately one-half that of wild-type protein. Our data show that compound heterozygous mutations in KPNA7 are associated with a human neurodevelopmental disease, and provide the first example of a human disease associated with mutation of a nuclear transport receptor.
Paciorkowski, Alex R; Weisenberg, Judy; Kelley, Joshua B; Spencer, Adam; Tuttle, Emily; Ghoneim, Dalia; Thio, Liu Lin; Christian, Susan L; Dobyns, William B; Paschal, Bryce M
2014-01-01
Nuclear import receptors of the KPNA family recognize the nuclear localization signal in proteins and together with importin-β mediate translocation into the nucleus. Accordingly, KPNA family members have a highly conserved architecture with domains that contact the nuclear localization signal and bind to importin-β. Here, we describe autosomal recessive mutations in KPNA7 found by whole exome sequencing in a sibling pair with severe developmental disability, infantile spasms, subsequent intractable epilepsy consistent with Lennox–Gastaut syndrome, partial agenesis of the corpus callosum, and cerebellar vermis hypoplasia. The mutations mapped to exon 7 in KPNA7 result in two amino-acid substitutions, Pro339Ala and Glu344Gln. On the basis of the crystal structure of the paralog KPNA2 bound to a bipartite nuclear localization signal from the retinoblastoma protein, the amino-acid substitutions in the affected subjects were predicted to occur within the seventh armadillo repeat that forms one of the two nuclear localization signal-binding sites in KPNA family members. Glu344 is conserved in all seven KPNA proteins, and we found that the Glu354Gln mutation in KPNA2 is sufficient to reduce binding to the retinoblastoma nuclear localization signal to approximately one-half that of wild-type protein. Our data show that compound heterozygous mutations in KPNA7 are associated with a human neurodevelopmental disease, and provide the first example of a human disease associated with mutation of a nuclear transport receptor. PMID:24045845
Park, Hahnbeom; Lee, Gyu Rie; Heo, Lim; Seok, Chaok
2014-01-01
Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.
Meng, Wei; Hsiao, An-Shan; Gao, Caiji; Jiang, Liwen; Chye, Mee-Len
2014-07-01
Acyl-CoA-binding proteins (ACBPs) show conservation at the acyl-CoA-binding (ACB) domain which facilitates binding to acyl-CoA esters. In Arabidopsis thaliana, six ACBPs participate in development and stress responses. Rice (Oryza sativa) also contains six genes encoding ACBPs. We investigated differences in subcellular localization between monocot rice and eudicot A. thaliana ACBPs. The subcellular localization of the six OsACBPs was achieved via transient expression of green fluorescence protein (GFP) fusions in tobacco (Nicotiana tabacum) epidermal cells, and stable transformation of A. thaliana. As plant ACBPs had not been reported in the peroxisomes, OsACBP6::GFP localization was confirmed by transient expression in rice sheath cells. The function of OsACBP6 was investigated by overexpressing 35S::OsACBP6 in the peroxisomal abc transporter1 (pxa1) mutant defective in peroxisomal fatty acid β-oxidation. As predicted, OsACBP1::GFP and OsACBP2::GFP were localized to the cytosol, and OsACBP4::GFP and OsACBP5::GFP to the endoplasmic reticulum (ER). However, OsACBP3::GFP displayed subcellular multi-localization while OsACBP6::GFP was localized to the peroxisomes. 35S::OsACBP6-OE/pxa1 lines showed recovery in indole-3-butyric acid (IBA) peroxisomal β-oxidation, wound-induced VEGETATIVE STORAGE PROTEIN1 (VSP1) expression and jasmonic acid (JA) accumulation. These findings indicate a role for OsACBP6 in peroxisomal β-oxidation, and suggest that rice ACBPs are involved in lipid degradation in addition to lipid biosynthesis. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.
Sugimoto, Yu; Kitazumi, Yuki; Shirai, Osamu; Nishikawa, Koji; Higuchi, Yoshiki; Yamamoto, Masahiro; Kano, Kenji
2017-05-01
Electrostatic interactions between proteins are key factors that govern the association and reaction rate. We spectroscopically determine the second-order reaction rate constant (k) of electron transfer from [NiFe] hydrogenase (H 2 ase) to cytochrome (cyt) c 3 at various ionic strengths (I). The k value decreases with I. To analyze the results, we develop a semi-analytical formula for I dependence of k based on the assumptions that molecules are spherical and the reaction proceeds via a transition state. Fitting of the formula to the experimental data reveals that the interaction occurs in limited regions with opposite charges and with radii much smaller than those estimated from crystal structures. This suggests that local charges in H 2 ase and cyt c 3 play important roles in the reaction. Although the crystallographic data indicate a positive electrostatic potential over almost the entire surface of the proteins, there exists a small region with negative potential on H 2 ase at which the electron transfer from H 2 ase to cyt c 3 may occur. This local negative potential region is identical to the hypothetical interaction sphere predicted by the analysis. Furthermore, I dependence of k is predicted by the Adaptive Poisson-Boltzmann Solver considering all charges of the amino acids in the proteins and the configuration of H 2 ase/cyt c 3 complex. The calculation reproduces the experimental results except at extremely low I. These results indicate that the stabilization derived from the local electrostatic interaction in the H 2 ase/cyt c 3 complex overcomes the destabilization derived from the electrostatic repulsion of the overall positive charge of both proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Jiang, Zhong; Lohse, Christine M.; Chu, Peigou G.; Wu, Chin-Lee; Woda, Bruce A.; Rock, Kenneth L.; Kwon, Eugene D.
2009-01-01
BACKGROUND Whether an oncofetal protein, IMP3, can serve as a prognostic biomarker to predict metastasis for patients with localized papillary and chromophobe subtypes of renal cell carcinomas (RCCs) was investigated. METHODS The expression of IMP3 in 334 patients with primary papillary and chromophobe RCC from multiple medical centers was evaluated by immunohistochemistry. The 317 patients with localized papillary and chromophobe RCCs were further evaluated for outcome analyses. RESULTS IMP3 was significantly increased in a subset of localized papillary and chromophobe RCCs that subsequently metastasized. Patients with localized IMP3-positive tumors (n = 33; 10%) were over 10 times more likely to metastasize (risk ratio [RR], 11.38; 95% confidence interval [CI], 5.40–23.96; P <.001) and were nearly twice as likely to die (RR, 1.91; 95% CI, 1.13–3.22; P =.016) compared with patients with localized IMP3 negative tumors. The 5-year metastasis-free and overall survival rates were 64% and 58% for patients with IMP3-positive localized papillary and chromophobe RCCs compared with 98% and 85% for patients with IMP3 negative tumors, respectively. In multivariable analysis adjusting for the TNM stage and nuclear grade, patients with IMP3-positive tumors were still over 10 times more likely to progress to distant metastasis (RR, 13.45; 95% CI, 6.00–30.14; P <.001) and were still nearly twice as likely die (RR, 1.95; 95% CI, 1.15–3.31; P =.013) compared with patients with IMP3-negative tumors. CONCLUSIONS IMP3 is an independent prognostic biomarker that can be used to identify a subgroup of patients with localized papillary and chromophobe RCC who are at high risk for developing distant metastasis. PMID:18412154
Predicting Nonspecific Ion Binding Using DelPhi
Petukh, Marharyta; Zhenirovskyy, Maxim; Li, Chuan; Li, Lin; Wang, Lin; Alexov, Emil
2012-01-01
Ions are an important component of the cell and affect the corresponding biological macromolecules either via direct binding or as a screening ion cloud. Although some ion binding is highly specific and frequently associated with the function of the macromolecule, other ions bind to the protein surface nonspecifically, presumably because the electrostatic attraction is strong enough to immobilize them. Here, we test such a scenario and demonstrate that experimentally identified surface-bound ions are located at a potential that facilitates binding, which indicates that the major driving force is the electrostatics. Without taking into consideration geometrical factors and structural fluctuations, we show that ions tend to be bound onto the protein surface at positions with strong potential but with polarity opposite to that of the ion. This observation is used to develop a method that uses a DelPhi-calculated potential map in conjunction with an in-house-developed clustering algorithm to predict nonspecific ion-binding sites. Although this approach distinguishes only the polarity of the ions, and not their chemical nature, it can predict nonspecific binding of positively or negatively charged ions with acceptable accuracy. One can use the predictions in the Poisson-Boltzmann approach by placing explicit ions in the predicted positions, which in turn will reduce the magnitude of the local potential and extend the limits of the Poisson-Boltzmann equation. In addition, one can use this approach to place the desired number of ions before conducting molecular-dynamics simulations to neutralize the net charge of the protein, because it was shown to perform better than standard screened Coulomb canned routines, or to predict ion-binding sites in proteins. This latter is especially true for proteins that are involved in ion transport, because such ions are loosely bound and very difficult to detect experimentally. PMID:22735539
The polyomavirus BK agnoprotein co-localizes with lipid droplets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Unterstab, Gunhild; Gosert, Rainer; Leuenberger, David
Agnoprotein encoded by human polyomavirus BK (BKV) is a late cytoplasmic protein of 66 amino acids (aa) of unknown function. Immunofluorescence microscopy revealed a fine granular and a vesicular distribution in donut-like structures. Using BKV(Dunlop)-infected or agnoprotein-transfected cells, we investigated agnoprotein co-localization with subcellular structures. We found that agnoprotein co-localizes with lipid droplets (LD) in primary human renal tubular epithelial cells as well as in other cells supporting BKV replication in vitro (UTA, Vero cells). Using agnoprotein-enhanced green fluorescent protein (EGFP) fusion constructs, we demonstrate that agnoprotein aa 20-42 are required for targeting LD, whereas aa 1-20 or aa 42-66more » were not. Agnoprotein aa 22-40 are predicted to form an amphipathic helix, and mutations A25D and F39E, disrupting its hydrophobic domain, prevented LD targeting. However, changing the phosphorylation site serine-11 to alanine or aspartic acid did not alter LD co-localization. Our findings provide new clues to unravel agnoprotein function.« less
Walker, Michael J.; Zhou, Cong; Backen, Alison; Pernemalm, Maria; Williamson, Andrew J.K.; Priest, Lynsey J.C.; Koh, Pek; Faivre-Finn, Corinne; Blackhall, Fiona H.; Dive, Caroline; Whetton, Anthony D.
2015-01-01
Lung cancer is the most frequent cause of cancer-related death world-wide. Radiotherapy alone or in conjunction with chemotherapy is the standard treatment for locally advanced non-small cell lung cancer (NSCLC). Currently there is no predictive marker with clinical utility to guide treatment decisions in NSCLC patients undergoing radiotherapy. Identification of such markers would allow treatment options to be considered for more effective therapy. To enable the identification of appropriate protein biomarkers, plasma samples were collected from patients with non-small cell lung cancer before and during radiotherapy for longitudinal comparison following a protocol that carries sufficient power for effective discovery proteomics. Plasma samples from patients pre- and during radiotherapy who had survived > 18 mo were compared to the same time points from patients who survived < 14 mo using an 8 channel isobaric tagging tandem mass spectrometry discovery proteomics platform. Over 650 proteins were detected and relatively quantified. Proteins which showed a change during radiotherapy were selected for validation using an orthogonal antibody-based approach. Two of these proteins were verified in a separate patient cohort: values of CRP and LRG1 combined gave a highly significant indication of extended survival post one week of radiotherapy treatment. PMID:26425690
Characterization of the Avian Trojan Gene Family Reveals Contrasting Evolutionary Constraints
Petrov, Petar; Syrjänen, Riikka; Smith, Jacqueline; Gutowska, Maria Weronika; Uchida, Tatsuya; Vainio, Olli; Burt, David W
2015-01-01
“Trojan” is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules. PMID:25803627
Characterization of the avian Trojan gene family reveals contrasting evolutionary constraints.
Petrov, Petar; Syrjänen, Riikka; Smith, Jacqueline; Gutowska, Maria Weronika; Uchida, Tatsuya; Vainio, Olli; Burt, David W
2015-01-01
"Trojan" is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules.
Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra
2016-01-01
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Current Understanding of Usher Syndrome Type II
Yang, Jun; Wang, Le; Song, Hongman; Sokolov, Maxim
2012-01-01
Usher syndrome is the most common deafness-blindness caused by genetic mutations. To date, three genes have been identified underlying the most prevalent form of Usher syndrome, the type II form (USH2). The proteins encoded by these genes are demonstrated to form a complex in vivo. This complex is localized mainly at the periciliary membrane complex in photoreceptors and the ankle-link of the stereocilia in hair cells. Many proteins have been found to interact with USH2 proteins in vitro, suggesting that they are potential additional components of this USH2 complex and that the genes encoding these proteins may be the candidate USH2 genes. However, further investigations are critical to establish their existence in the USH2 complex in vivo. Based on the predicted functional domains in USH2 proteins, their cellular localizations in photoreceptors and hair cells, the observed phenotypes in USH2 mutant mice, and the known knowledge about diseases similar to USH2, putative biological functions of the USH2 complex have been proposed. Finally, therapeutic approaches for this group of diseases are now being actively explored. PMID:22201796
Xiao, Xuan; Cheng, Xiang; Chen, Genqiang; Mao, Qi; Chou, Kuo-Chen
2018-05-26
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mGpos" was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven training dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the training dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/, by which users can easily get their desired results without the need to go through the detailed mathematics. Copyright © 2018 Elsevier Inc. All rights reserved.
Bauman, Tyler M; Ewald, Jonathan A; Huang, Wei; Ricke, William A
2015-07-25
CD147 is an MMP-inducing protein often implicated in cancer progression. The purpose of this study was to investigate the expression of CD147 in prostate cancer (PCa) progression and the prognostic ability of CD147 in predicting biochemical recurrence after prostatectomy. Plasma membrane-localized CD147 protein expression was quantified in patient samples using immunohistochemistry and multispectral imaging, and expression was compared to clinico-pathological features (pathologic stage, Gleason score, tumor volume, preoperative PSA, lymph node status, surgical margins, biochemical recurrence status). CD147 specificity and expression were confirmed with immunoblotting of prostate cell lines, and CD147 mRNA expression was evaluated in public expression microarray datasets of patient prostate tumors. Expression of CD147 protein was significantly decreased in localized tumors (pT2; p = 0.02) and aggressive PCa (≥pT3; p = 0.004), and metastases (p = 0.001) compared to benign prostatic tissue. Decreased CD147 was associated with advanced pathologic stage (p = 0.009) and high Gleason score (p = 0.02), and low CD147 expression predicted biochemical recurrence (HR 0.55; 95 % CI 0.31-0.97; p = 0.04) independent of clinico-pathologic features. Immunoblot bands were detected at 44 kDa and 66 kDa, representing non-glycosylated and glycosylated forms of CD147 protein, and CD147 expression was lower in tumorigenic T10 cells than non-tumorigenic BPH-1 cells (p = 0.02). Decreased CD147 mRNA expression was associated with increased Gleason score and pathologic stage in patient tumors but is not associated with recurrence status. Membrane-associated CD147 expression is significantly decreased in PCa compared to non-malignant prostate tissue and is associated with tumor progression, and low CD147 expression predicts biochemical recurrence after prostatectomy independent of pathologic stage, Gleason score, lymph node status, surgical margins, and tumor volume in multivariable analysis.
Cazelles, R; Lalaoui, N; Hartmann, T; Leimkühler, S; Wollenberger, U; Antonietti, M; Cosnier, S
2016-11-15
Direct electron transfer (DET) to proteins is of considerable interest for the development of biosensors and bioelectrocatalysts. While protein structure is mainly used as a method of attaching the protein to the electrode surface, we employed bioinformatics analysis to predict the suitable orientation of the enzymes to promote DET. Structure similarity and secondary structure prediction were combined underlying localized amino-acids able to direct one of the enzyme's electron relays toward the electrode surface by creating a suitable bioelectrocatalytic nanostructure. The electro-polymerization of pyrene pyrrole onto a fluorine-doped tin oxide (FTO) electrode allowed the targeted orientation of the formate dehydrogenase enzyme from Rhodobacter capsulatus (RcFDH) by means of hydrophobic interactions. Its electron relays were directed to the FTO surface, thus promoting DET. The reduction of nicotinamide adenine dinucleotide (NAD(+)) generating a maximum current density of 1μAcm(-2) with 10mM NAD(+) leads to a turnover number of 0.09electron/s/molRcFDH. This work represents a practical approach to evaluate electrode surface modification strategies in order to create valuable bioelectrocatalysts. Copyright © 2016 Elsevier B.V. All rights reserved.
Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini
2015-05-01
The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; < 5%). Detailed analysis of the genes predicted to encode proteins of the abH08 superfamily revealed a high proportion related to epoxide hydrolases and haloalkane dehalogenases in polluted mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. © 2014 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Ramos-León, Félix; Mariscal, Vicente; Frías, José E; Flores, Enrique; Herrero, Antonia
2015-05-01
Heterocyst-forming cyanobacteria are multicellular organisms that grow as filaments that can be hundreds of cells long. Septal junction complexes, of which SepJ is a possible component, appear to join the cells in the filament. SepJ is a cytoplasmic membrane protein that contains a long predicted periplasmic section and localizes not only to the cell poles in the intercellular septa but also to a position similar to a Z ring when cell division starts suggesting a relation with the divisome. Here, we created a mutant of Anabaena sp. strain PCC 7120 in which the essential divisome gene ftsZ is expressed from a synthetic NtcA-dependent promoter, whose activity depends on the nitrogen source. In the presence of ammonium, low levels of FtsZ were produced, and the subcellular localization of SepJ, which was investigated by immunofluorescence, was impaired. Possible interactions of SepJ with itself and with divisome proteins FtsZ, FtsQ and FtsW were investigated using the bacterial two-hybrid system. We found SepJ self-interaction and a specific interaction with FtsQ, confirmed by co-purification and involving parts of the SepJ and FtsQ periplasmic sections. Therefore, SepJ can form multimers, and in Anabaena, the divisome has a role beyond cell division, localizing a septal protein essential for multicellularity. © 2015 John Wiley & Sons Ltd.
2013-01-01
Background Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. Methods This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. Results and conclusions We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface. PMID:24564970
2010-01-01
Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Prediction of protein secondary structure content for the twilight zone sequences.
Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke
2007-11-15
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.
Breitenbach, Heiko H.; Wenig, Marion; Wittek, Finni; Jordá, Lucia; Maldonado-Alconada, Ana M.; Sarioglu, Hakan; Colby, Thomas; Knappe, Claudia; Bichlmeier, Marlies; Pabst, Elisabeth; Mackey, David; Parker, Jane E.; Vlot, A. Corina
2014-01-01
Systemic acquired resistance (SAR) is an inducible immune response that depends on ENHANCED DISEASE SUSCEPTIBILITY1 (EDS1). Here, we show that Arabidopsis (Arabidopsis thaliana) EDS1 is required for both SAR signal generation in primary infected leaves and SAR signal perception in systemic uninfected tissues. In contrast to SAR signal generation, local resistance remains intact in eds1 mutant plants in response to Pseudomonas syringae delivering the effector protein AvrRpm1. We utilized the SAR-specific phenotype of the eds1 mutant to identify new SAR regulatory proteins in plants conditionally expressing AvrRpm1. Comparative proteomic analysis of apoplast-enriched extracts from AvrRpm1-expressing wild-type and eds1 mutant plants led to the identification of 12 APOPLASTIC, EDS1-DEPENDENT (AED) proteins. The genes encoding AED1, a predicted aspartyl protease, and another AED, LEGUME LECTIN-LIKE PROTEIN1 (LLP1), were induced locally and systemically during SAR signaling and locally by salicylic acid (SA) or its functional analog, benzo 1,2,3-thiadiazole-7-carbothioic acid S-methyl ester. Because conditional overaccumulation of AED1-hemagglutinin inhibited SA-induced resistance and SAR but not local resistance, the data suggest that AED1 is part of a homeostatic feedback mechanism regulating systemic immunity. In llp1 mutant plants, SAR was compromised, whereas the local resistance that is normally associated with EDS1 and SA as well as responses to exogenous SA appeared largely unaffected. Together, these data indicate that LLP1 promotes systemic rather than local immunity, possibly in parallel with SA. Our analysis reveals new positive and negative components of SAR and reinforces the notion that SAR represents a distinct phase of plant immunity beyond local resistance. PMID:24755512
Tertiary structural propensities reveal fundamental sequence/structure relationships.
Zheng, Fan; Zhang, Jian; Grigoryan, Gevorg
2015-05-05
Extracting useful generalizations from the continually growing Protein Data Bank (PDB) is of central importance. We hypothesize that the PDB contains valuable quantitative information on the level of local tertiary structural motifs (TERMs). We show that by breaking a protein structure into its constituent TERMs, and querying the PDB to characterize the natural ensemble matching each, we can estimate the compatibility of the structure with a given amino acid sequence through a metric we term "structure score." Considering submissions from recent Critical Assessment of Structure Prediction (CASP) experiments, we found a strong correlation (R = 0.69) between structure score and model accuracy, with poorly predicted regions readily identifiable. This performance exceeds that of leading atomistic statistical energy functions. Furthermore, TERM-based analysis of two prototypical multi-state proteins rapidly produced structural insights fully consistent with prior extensive experimental studies. We thus find that TERM-based analysis should have considerable utility for protein structural biology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272
Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae
Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike
2006-01-01
Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047
Optimizing physical energy functions for protein folding.
Fujitsuka, Yoshimi; Takada, Shoji; Luthey-Schulten, Zaida A; Wolynes, Peter G
2004-01-01
We optimize a physical energy function for proteins with the use of the available structural database and perform three benchmark tests of the performance: (1) recognition of native structures in the background of predefined decoy sets of Levitt, (2) de novo structure prediction using fragment assembly sampling, and (3) molecular dynamics simulations. The energy parameter optimization is based on the energy landscape theory and uses a Monte Carlo search to find a set of parameters that seeks the largest ratio deltaE(s)/DeltaE for all proteins in a training set simultaneously. Here, deltaE(s) is the stability gap between the native and the average in the denatured states and DeltaE is the energy fluctuation among these states. Some of the energy parameters optimized are found to show significant correlation with experimentally observed quantities: (1) In the recognition test, the optimized function assigns the lowest energy to either the native or a near-native structure among many decoy structures for all the proteins studied. (2) Structure prediction with the fragment assembly sampling gives structure models with root mean square deviation less than 6 A in one of the top five cluster centers for five of six proteins studied. (3) Structure prediction using molecular dynamics simulation gives poorer performance, implying the importance of having a more precise description of local structures. The physical energy function solely inferred from a structural database neither utilizes sequence information from the family of the target nor the outcome of the secondary structure prediction but can produce the correct native fold for many small proteins. Copyright 2003 Wiley-Liss, Inc.
Intracellular Electric Field and pH Optimize Protein Localization and Movement
Cunningham, Jessica; Estrella, Veronica; Lloyd, Mark; Gillies, Robert; Frieden, B. Roy; Gatenby, Robert
2012-01-01
Mammalian cell function requires timely and accurate transmission of information from the cell membrane (CM) to the nucleus (N). These pathways have been intensively investigated and many critical components and interactions have been identified. However, the physical forces that control movement of these proteins have received scant attention. Thus, transduction pathways are typically presented schematically with little regard to spatial constraints that might affect the underlying dynamics necessary for protein-protein interactions and molecular movement from the CM to the N. We propose messenger protein localization and movements are highly regulated and governed by Coulomb interactions between: 1. A recently discovered, radially directed E-field from the NM into the CM and 2. Net protein charge determined by its isoelectric point, phosphorylation state, and the cytosolic pH. These interactions, which are widely applied in elecrophoresis, provide a previously unknown mechanism for localization of messenger proteins within the cytoplasm as well as rapid shuttling between the CM and N. Here we show these dynamics optimize the speed, accuracy and efficiency of transduction pathways even allowing measurement of the location and timing of ligand binding at the CM –previously unknown components of intracellular information flow that are, nevertheless, likely necessary for detecting spatial gradients and temporal fluctuations in ligand concentrations within the environment. The model has been applied to the RAF-MEK-ERK pathway and scaffolding protein KSR1 using computer simulations and in-vitro experiments. The computer simulations predicted distinct distributions of phosphorylated and unphosphorylated components of this transduction pathway which were experimentally confirmed in normal breast epithelial cells (HMEC). PMID:22623963
Buried and accessible surface area control intrinsic protein flexibility.
Marsh, Joseph A
2013-09-09
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the relative solvent-accessible surface area (Arel), therefore shows excellent agreement with independent measures of global protein flexibility derived from various experimental and computational methods. Application of Arel on a large scale demonstrates its utility by revealing unique sequence and structural properties associated with intrinsic flexibility. In particular, flexibility as measured by Arel shows little correspondence with intrinsic disorder, but instead tends to be associated with multiple domains and increased α-helical structure. Furthermore, the apparent flexibility of monomeric proteins is found to be useful for identifying quaternary-structure errors in published crystal structures. There is also a strong tendency for the crystal structures of more flexible proteins to be solved to lower resolutions. Finally, local solvent accessibility is shown to be a primary determinant of local residue flexibility. Overall, this work provides both fundamental mechanistic insight into the origin of protein flexibility and a simple, practical method for predicting flexibility from protein structures. © 2013 Elsevier Ltd. All rights reserved.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
PharmDock: a pharmacophore-based docking program
2014-01-01
Background Protein-based pharmacophore models are enriched with the information of potential interactions between ligands and the protein target. We have shown in a previous study that protein-based pharmacophore models can be applied for ligand pose prediction and pose ranking. In this publication, we present a new pharmacophore-based docking program PharmDock that combines pose sampling and ranking based on optimized protein-based pharmacophore models with local optimization using an empirical scoring function. Results Tests of PharmDock on ligand pose prediction, binding affinity estimation, compound ranking and virtual screening yielded comparable or better performance to existing and widely used docking programs. The docking program comes with an easy-to-use GUI within PyMOL. Two features have been incorporated in the program suite that allow for user-defined guidance of the docking process based on previous experimental data. Docking with those features demonstrated superior performance compared to unbiased docking. Conclusion A protein pharmacophore-based docking program, PharmDock, has been made available with a PyMOL plugin. PharmDock and the PyMOL plugin are freely available from http://people.pharmacy.purdue.edu/~mlill/software/pharmdock. PMID:24739488
Prediction of protein subcellular locations by GO-FunD-PseAA predictor.
Chou, Kuo-Chen; Cai, Yu-Dong
2004-08-06
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure.
Evaluation of protein docking predictions using Hex 3.1 in CAPRI rounds 1 and 2.
Ritchie, David W
2003-07-01
This article describes and reviews our efforts using Hex 3.1 to predict the docking modes of the seven target protein-protein complexes presented in the CAPRI (Critical Assessment of Predicted Interactions) blind docking trial. For each target, the structure of at least one of the docking partners was given in its unbound form, and several of the targets involved large multimeric structures (e.g., Lactobacillus HPr kinase, hemagglutinin, bovine rotavirus VP6). Here we describe several enhancements to our original spherical polar Fourier docking correlation algorithm. For example, a novel surface sphere smothering algorithm is introduced to generate multiple local coordinate systems around the surface of a large receptor molecule, which may be used to define a small number of initial ligand-docking orientations distributed over the receptor surface. High-resolution spherical polar docking correlations are performed over the resulting receptor surface patches, and candidate docking solutions are refined by using a novel soft molecular mechanics energy minimization procedure. Overall, this approach identified two good solutions at rank 5 or less for two of the seven CAPRI complexes. Subsequent analysis of our results shows that Hex 3.1 is able to place good solutions within a list of
Diehl, Roger C.; Guinn, Emily J.; Capp, Michael W.; Tsodikov, Oleg V.; Record, M. Thomas
2013-01-01
To quantify interactions of the osmolyte L-proline with protein functional groups and predict its effects on protein processes, we use vapor pressure osmometry to determine chemical potential derivatives dµ2/dm3 = µ23 quantifying preferential interactions of proline (component 3) with 21 solutes (component 2) selected to display different combinations of aliphatic or aromatic C, amide, carboxylate, phosphate or hydroxyl O, and/or amide or cationic N surface. Solubility data yield µ23 values for 4 less-soluble solutes. Values of µ23 are dissected using an ASA-based analysis to test the hypothesis of additivity and obtain α-values (proline interaction potentials) for these eight surface types and three inorganic ions. Values of µ23 predicted from these α-values agree with experiment, demonstrating additivity. Molecular interpretation of α-values using the solute partitioning model yields partition coefficients (Kp) quantifying the local accumulation or exclusion of proline in the hydration water of each functional group. Interactions of proline with native protein surface and effects of proline on protein unfolding are predicted from α-values and ASA information and compared with experimental data, with results for glycine betaine and urea, and with predictions from transfer free energy analysis. We conclude that proline stabilizes proteins because of its unfavorable interactions with (exclusion from) amide oxygens and aliphatic hydrocarbon surface exposed in unfolding, and that proline is an effective in vivo osmolyte because of the osmolality increase resulting from its unfavorable interactions with anionic (carboxylate and phosphate) and amide oxygens and aliphatic hydrocarbon groups on the surface of cytoplasmic proteins and nucleic acids. PMID:23909383
Functional analysis of the Arabidopsis PHT4 family of intracellular phosphate transporters.
Guo, B; Jin, Y; Wussler, C; Blancaflor, E B; Motes, C M; Versaw, W K
2008-01-01
The transport of phosphate (Pi) between subcellular compartments is central to metabolic regulation. Although some of the transporters involved in controlling the intracellular distribution of Pi have been identified in plants, others are predicted from genetic, biochemical and bioinformatics studies. Heterologous expression in yeast, and gene expression and localization in plants were used to characterize all six members of an Arabidopsis thaliana membrane transporter family designated here as PHT4. PHT4 proteins share similarity with SLC17/type I Pi transporters, a diverse group of animal proteins involved in the transport of Pi, organic anions and chloride. All of the PHT4 proteins mediate Pi transport in yeast with high specificity. Bioinformatic analysis and localization of PHT4-GFP fusion proteins indicate that five of the proteins are targeted to the plastid envelope, and the sixth resides in the Golgi apparatus. PHT4 genes are expressed in both roots and leaves, although two of the genes are expressed predominantly in leaves and one mostly in roots. These expression patterns, together with Pi transport activities and subcellular locations, suggest roles for PHT4 proteins in the transport of Pi between the cytosol and chloroplasts, heterotrophic plastids and the Golgi apparatus.
Howell, Brett A; Chauhan, Anuj
2010-08-01
Physiologically based pharmacokinetic (PBPK) models were developed for design and optimization of liposome therapy for treatment of overdoses of tricyclic antidepressants and local anesthetics. In vitro drug-binding data for pegylated, anionic liposomes and published mechanistic equations for partition coefficients were used to develop the models. The models were proven reliable through comparisons to intravenous data. The liposomes were predicted to be highly effective at treating amitriptyline overdoses, with reductions in the area under the concentration versus time curves (AUC) of 64% for the heart and brain. Peak heart and brain drug concentrations were predicted to drop by 20%. Bupivacaine AUC and peak concentration reductions were lower at 15.4% and 17.3%, respectively, for the heart and brain. The predicted pharmacokinetic profiles following liposome administration agreed well with data from clinical studies where protein fragments were administered to patients for overdose treatment. Published data on local cardiac function were used to relate the predicted concentrations in the body to local pharmacodynamic effects in the heart. While the results offer encouragement for future liposome therapies geared toward overdose, it is imperative to point out that animal experiments and phase I clinical trials are the next steps to ensuring the efficacy of the treatment. (c) 2010 Wiley-Liss, Inc. and the American Pharmacists Association
Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi
2017-03-01
Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Lin, Jhih-Rong; Liu, Zhonghao; Hu, Jianjun
2014-10-01
The binding affinity between a nuclear localization signal (NLS) and its import receptor is closely related to corresponding nuclear import activity. PTM-based modulation of the NLS binding affinity to the import receptor is one of the most understood mechanisms to regulate nuclear import of proteins. However, identification of such regulation mechanisms is challenging due to the difficulty of assessing the impact of PTM on corresponding nuclear import activities. In this study we proposed NIpredict, an effective algorithm to predict nuclear import activity given its NLS, in which molecular interaction energy components (MIECs) were used to characterize the NLS-import receptor interaction, and the support vector regression machine (SVR) was used to learn the relationship between the characterized NLS-import receptor interaction and the corresponding nuclear import activity. Our experiments showed that nuclear import activity change due to NLS change could be accurately predicted by the NIpredict algorithm. Based on NIpredict, we developed a systematic framework to identify potential PTM-based nuclear import regulations for human and yeast nuclear proteins. Application of this approach has identified the potential nuclear import regulation mechanisms by phosphorylation of two nuclear proteins including SF1 and ORC6. © 2014 Wiley Periodicals, Inc.
Genome-scale model reveals metabolic basis of biomass partitioning in a model diatom
Levering, Jennifer; Broddrick, Jared; Dupont, Christopher L.; ...
2016-05-06
Diatoms are eukaryotic microalgae that contain genes from various sources, including bacteria and the secondary endosymbiotic host. Due to this unique combination of genes, diatoms are taxonomically and functionally distinct from other algae and vascular plants and confer novel metabolic capabilities. Based on the genome annotation, we performed a genome-scale metabolic network reconstruction for the marine diatom Phaeodactylum tricornutum. Due to their endosymbiotic origin, diatoms possess a complex chloroplast structure which complicates the prediction of subcellular protein localization. Based on previous work we implemented a pipeline that exploits a series of bioinformatics tools to predict protein localization. The manually curatedmore » reconstructed metabolic network iLB1027_lipid accounts for 1,027 genes associated with 4,456 reactions and 2,172 metabolites distributed across six compartments. To constrain the genome-scale model, we determined the organism specific biomass composition in terms of lipids, carbohydrates, and proteins using Fourier transform infrared spectrometry. Our simulations indicate the presence of a yet unknown glutamine-ornithine shunt that could be used to transfer reducing equivalents generated by photosynthesis to the mitochondria. Furthermore, the model reflects the known biochemical composition of P. tricornutum in defined culture conditions and enables metabolic engineering strategies to improve the use of P. tricornutum for biotechnological applications.« less
Park, Joohae; Tefsen, Boris; Heemskerk, Marc J; Lagendijk, Ellen L; van den Hondel, Cees A M J J; van Die, Irma; Ram, Arthur F J
2015-11-02
Galactofuranose (Galf)-containing glycoconjugates are present in numerous microbes, including filamentous fungi where they are important for morphology, virulence and maintaining cell wall integrity. The incorporation of Galf-residues into galactomannan, galactomannoproteins and glycolipids is carried out by Golgi-localized Galf transferases. The nucleotide sugar donor used by these transferases (UDP-Galf) is produced in the cytoplasm and has to be transported to the lumen of the Golgi by a dedicated nucleotide sugar transporter. Based on homology with recently identified UDP-Galf-transporters in A. fumigatus and A. nidulans, two putative UDP-Galf-transporters in A. niger were found. Their function and localization was determined by gene deletions and GFP-tagging studies, respectively. The two putative UDP-Galf-transporters in A. niger are homologous to each other and are predicted to contain eleven transmembrane domains (UgtA) or ten transmembrane domains (UgtB) due to a reduced length of the C-terminal part of the UgtB protein. The presence of two putative UDP-Galf-transporters in the genome was not unique for A. niger. From the twenty Aspergillus species analysed, nine species contained two additional putative UDP-Galf-transporters. Three of the nine species were outside the Aspergillus section nigri, indication an early duplication of UDP-Galf-transporters and subsequent loss of the UgtB copy in several aspergilli. Deletion analysis of the single and double mutants in A. niger indicated that the two putative UDP-Galf-transporters (named UgtA and UgtB) have a redundant function in UDP-Galf-transport as only the double mutant displayed a Galf-negative phenotype. The Galf-negative phenotype of the double mutant could be complemented by expressing either CFP-UgtA or CFP-UgtB fusion proteins from their endogenous promoters, indicating that both CFP-tagged proteins are functional. Both Ugt proteins co-localize with each other as well as with the GDP-mannose nucleotide transporter, as was demonstrated by fluorescence microscopy, thereby confirming their predicted localization in the Golgi. A. niger contains two genes encoding UDP-Galf-transporters. Deletion and localization studies indicate that UgtA and UgtB have redundant functions in the biosynthesis of Galf-containing glycoconjugates.
Nakajima, Masao; Yoshino, Shigefumi; Kanekiyo, Shinsuke; Maeda, Noriko; Sakamoto, Kazuhiko; Tsunedomi, Ryoichi; Suzuki, Nobuaki; Takeda, Shigeru; Yamamoto, Shigeru; Hazama, Shoichi; Hoshii, Yoshinobu; Oga, Atsunori; Itoh, Hiroshi; Ueno, Tomio; Nagano, Hiroaki
2018-01-01
Secreted protein acidic and rich in cysteine (SPARC) is an extracellular matrix glycoprotein that may serve an important role in epithelial-mesenchymal transition. Recent studies have demonstrated that SPARC status is a prognostic indicator in various cancer types; however, its value remains unclear in gastric cancer (GC). In the present study, the localization and prognostic impact of SPARC expression were evaluated in patients with GC. Immunohistochemical analysis of SPARC expression was performed in 117 surgically resected GC specimens, and the localization of SPARC positive cells, as well as the rassociation between SPARC expression and clinicopathological characteristics were evaluated. High SPARC expression was observed in 47 cases; the glycoprotein was localized in the peritumoral fibroblasts, but was rarely observed in the cytoplasm of cancer cells. Heterogeneity of SPARC expression was observed in 52 cases. High stromal SPARC expression was identified to be an independent predictor of more favorable prognosis (overall survival and recurrence free survival) in all patients (P<0.001). On subgroup analysis, this association remained significant in patients who received adjuvant chemotherapy, but not in patients who did not (P<0.001). Stromal SPARC expression predicts better prognosis in GC patients who underwent curative resection; this appears to be associated with improved response to chemotherapy. PMID:29403557
Ahmed, Ali Abdurehim; Pedersen, Carsten; Schultz-Larsen, Torsten; Kwaaitaal, Mark; Jørgensen, Hans Jørgen Lyngs; Thordal-Christensen, Hans
2015-01-01
Pathogens secrete effector proteins to establish a successful interaction with their host. Here, we describe two barley (Hordeum vulgare) powdery mildew candidate secreted effector proteins, CSEP0105 and CSEP0162, which contribute to pathogen success and appear to be required during or after haustorial formation. Silencing of either CSEP using host-induced gene silencing significantly reduced the fungal haustorial formation rate. Interestingly, both CSEPs interact with the barley small heat shock proteins, Hsp16.9 and Hsp17.5, in a yeast two-hybrid assay. Small heat shock proteins are known to stabilize several intracellular proteins, including defense-related signaling components, through their chaperone activity. CSEP0105 and CSEP0162 localized to the cytosol and the nucleus of barley epidermal cells, whereas Hsp16.9 and Hsp17.5 are cytosolic. Intriguingly, only those specific CSEPs changed localization and became restricted to the cytosol when coexpressed with Hsp16.9 and Hsp17.5, confirming the CSEP-small heat shock protein interaction. As predicted, Hsp16.9 showed chaperone activity, as it could prevent the aggregation of Escherichia coli proteins during thermal stress. Remarkably, CSEP0105 compromised this activity. These data suggest that CSEP0105 promotes virulence by interfering with the chaperone activity of a barley small heat shock protein essential for defense and stress responses. PMID:25770154
Zolfaghari Emameh, Reza; Barker, Harlan; Hytönen, Vesa P; Tolvanen, Martti E E; Parkkila, Seppo
2014-08-29
The genomes of many insect and parasite species contain beta carbonic anhydrase (β-CA) protein coding sequences. The lack of β-CA proteins in mammals makes them interesting target proteins for inhibition in treatment of some infectious diseases and pests. Many insects and parasites represent important pests for agriculture and cause enormous economic damage worldwide. Meanwhile, pollution of the environment by old pesticides, emergence of strains resistant to them, and their off-target effects are major challenges for agriculture and society. In this study, we analyzed a multiple sequence alignment of 31 β-CAs from insects, some parasites, and selected plant species relevant to agriculture and livestock husbandry. Using bioinformatics tools a phylogenetic tree was generated and the subcellular localizations and antigenic sites of each protein were predicted. Structural models for β-CAs of Ancylostoma caninum, Ascaris suum, Trichinella spiralis, and Entamoeba histolytica, were built using Pisum sativum and Mycobacterium tuberculosis β-CAs as templates. Six β-CAs of insects and parasites and six β-CAs of plants are predicted to be mitochondrial and chloroplastic, respectively, and thus may be involved in important metabolic functions. All 31 sequences showed the presence of the highly conserved β-CA active site sequence motifs, CXDXR and HXXC (C: cysteine, D: aspartic acid, R: arginine, H: histidine, X: any residue). We discovered that these two motifs are more antigenic than others. Homology models suggested that these motifs are mostly buried and thus not well accessible for recognition by antibodies. The predicted mitochondrial localization of several β-CAs and hidden antigenic epitopes within the protein molecule, suggest that they may not be considered major targets for vaccines. Instead, they are promising candidate enzymes for small-molecule inhibitors which can easily penetrate the cell membrane. Based on current knowledge, we conclude that β-CAs are potential targets for development of small molecule pesticides or anti-parasitic agents with minimal side effects on vertebrates.
Florica, Roxana Oriana; Hipolito, Victoria; Bautista, Stephen; Anvari, Homa; Rapp, Chloe; El-Rass, Suzan; Asgharian, Alimohammad; Antonescu, Costin N; Killeen, Marie T
2017-10-01
The axons of the DA and DB classes of motor neurons fail to reach the dorsal cord in the absence of the guidance cue UNC-6/Netrin or its receptor UNC-5 in C. elegans. However, the axonal processes usually exit their cell bodies in the ventral cord in the absence of both molecules. Strains lacking functional versions of UNC-6 or UNC-5 have a low level of DA and DB motor neuron axon outgrowth defects. We found that mutations in the genes for all six of the ENU-3 proteins function to enhance the outgrowth defects of the DA and DB axons in strains lacking either UNC-6 or UNC-5. A mutation in the gene for the MIG-14/Wntless protein also enhances defects in a strain lacking either UNC-5 or UNC-6, suggesting that the ENU-3 and Wnt pathways function parallel to the Netrin pathway in directing motor neuron axon outgrowth. Our evidence suggests that the ENU-3 proteins are novel members of the Wnt pathway in nematodes. Five of the six members of the ENU-3 family are predicted to be single-pass trans-membrane proteins. The expression pattern of ENU-3.1 was consistent with plasma membrane localization. One family member, ENU-3.6, lacks the predicted signal peptide and the membrane-spanning domain. In HeLa cells ENU-3.6 had a cytoplasmic localization and caused actin dependent processes to appear. We conclude that the ENU-3 family proteins function in a pathway parallel to the UNC-6/Netrin pathway for motor neuron axon outgrowth, most likely in the Wnt pathway. Copyright © 2017 Elsevier Inc. All rights reserved.
Theoretical and computational studies in protein folding, design, and function
NASA Astrophysics Data System (ADS)
Morrissey, Michael Patrick
2000-10-01
In this work, simplified statistical models are used to understand an array of processes related to protein folding and design. In Part I, lattice models are utilized to test several theories about the statistical properties of protein-like systems. In Part II, sequence analysis and all-atom simulations are used to advance a novel theory for the behavior of a particular protein. Part I is divided into five chapters. In Chapter 2, a method of sequence design for model proteins, based on statistical mechanical first-principles, is developed. The cumulant design method uses a mean-field approximation to expand the free energy of a sequence in temperature. The method successfully designs sequences which fold to a target lattice structure at a specific temperature, a feat which was not possible using previous design methods. The next three chapters are computational studies of the double mutant cycle, which has been used experimentally to predict intra-protein interactions. Complete structure prediction is demonstrated for a model system using exhaustive, and also sub-exhaustive, double mutants. Nonadditivity of enthalpy, rather than of free energy, is proposed and demonstrated to be a superior marker for inter-residue contact. Next, a new double mutant protocol, called exchange mutation, is introduced. Although simple statistical arguments predict exchange mutation to be a more accurate contact predictor than standard mutant cycles, this hypothesis was not upheld in lattice simulations. Reasons for this inconsistency will be discussed. Finally, a multi-chain folding algorithm is introduced. Known as LINKS, this algorithm was developed to test a method of structure prediction which utilizes chain-break mutants. While structure prediction was not successful, LINKS should nevertheless be a useful tool for the study of protein-protein and protein-ligand interactions. The last chapter of Part I utilizes the lattice to explore the differences between standard folding, from the fully denatured state, and cotranslational folding, whereby one end of a protein is synthesized and released before the other. Cotranslational folding is shown to accelerate folding kinetics, particularly when the target backbone contains many local contacts. Additionally, cotranslation is shown capable of "guiding" a model protein into a metastable, local contact-rich state, despite the existence of a true native state of much lower energy. In Part II, a model is developed for the behavior of PrP, a unique mammalian protein which has been shown to possess two native states. The pathogenic "scrapie" state PrPSc, which has not been structurally characterized, is known to trigger conversion of the characterized endogenous conformation PrPC into additional PrPSc, Residues 144--153 are shown to form the most hydrophilic naturally occurring alpha-helix, out of a broad database with more than 10,000 candidates. The novel beta-nucleation model proposes that PrPSc, is not a distinct mono-molecular state, but is rather a beta-sheet-like aggregate centered around helix-1 components of multiple PrP molecules. The remainder of Part II uses molecular dynamics simulations to support the beta-nucleation hypothesis, and to propose a system of peptide ligands which may arrest the process of prion propagation.
High Precision Prediction of Functional Sites in Protein Structures
Buturovic, Ljubomir; Wong, Mike; Tang, Grace W.; Altman, Russ B.; Petkovic, Dragutin
2014-01-01
We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta. PMID:24632601
Analysis of Ribosome Stalling and Translation Elongation Dynamics by Deep Learning.
Zhang, Sai; Hu, Hailin; Zhou, Jingtian; He, Xuan; Jiang, Tao; Zeng, Jianyang
2017-09-27
Ribosome stalling is manifested by the local accumulation of ribosomes at specific codon positions of mRNAs. Here, we present ROSE, a deep learning framework to analyze high-throughput ribosome profiling data and estimate the probability of a ribosome stalling event occurring at each genomic location. Extensive validation tests on independent data demonstrated that ROSE possessed higher prediction accuracy than conventional prediction models, with an increase in the area under the receiver operating characteristic curve by up to 18.4%. In addition, genome-wide statistical analyses showed that ROSE predictions can be well correlated with diverse putative regulatory factors of ribosome stalling. Moreover, the genome-wide ribosome stalling landscapes of both human and yeast computed by ROSE recovered the functional interplays between ribosome stalling and cotranslational events in protein biogenesis, including protein targeting by the signal recognition particles and protein secondary structure formation. Overall, our study provides a novel method to complement the ribosome profiling techniques and further decipher the complex regulatory mechanisms underlying translation elongation dynamics encoded in the mRNA sequence. Copyright © 2017 Elsevier Inc. All rights reserved.
2010-01-01
Background Puf proteins have important roles in controlling gene expression at the post-transcriptional level by promoting RNA decay and repressing translation. The Pumilio homology domain (PUM-HD) is a conserved region within Puf proteins that binds to RNA with sequence specificity. Although Puf proteins have been well characterized in animal and fungal systems, little is known about the structural and functional characteristics of Puf-like proteins in plants. Results The Arabidopsis and rice genomes code for 26 and 19 Puf-like proteins, respectively, each possessing eight or fewer Puf repeats in their PUM-HD. Key amino acids in the PUM-HD of several of these proteins are conserved with those of animal and fungal homologs, whereas other plant Puf proteins demonstrate extensive variability in these amino acids. Three-dimensional modeling revealed that the predicted structure of this domain in plant Puf proteins provides a suitable surface for binding RNA. Electrophoretic gel mobility shift experiments showed that the Arabidopsis AtPum2 PUM-HD binds with high affinity to BoxB of the Drosophila Nanos Response Element I (NRE1) RNA, whereas a point mutation in the core of the NRE1 resulted in a significant reduction in binding affinity. Transient expression of several of the Arabidopsis Puf proteins as fluorescent protein fusions revealed a dynamic, punctate cytoplasmic pattern of localization for most of these proteins. The presence of predicted nuclear export signals and accumulation of AtPuf proteins in the nucleus after treatment of cells with leptomycin B demonstrated that shuttling of these proteins between the cytosol and nucleus is common among these proteins. In addition to the cytoplasmically enriched AtPum proteins, two AtPum proteins showed nuclear targeting with enrichment in the nucleolus. Conclusions The Puf family of RNA-binding proteins in plants consists of a greater number of members than any other model species studied to date. This, along with the amino acid variability observed within their PUM-HDs, suggests that these proteins may be involved in a wide range of post-transcriptional regulatory events that are important in providing plants with the ability to respond rapidly to changes in environmental conditions and throughout development. PMID:20214804
Computational intelligence techniques for biological data mining: An overview
NASA Astrophysics Data System (ADS)
Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari
2014-10-01
Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Stoichiometry of Nck-dependent actin polymerization in living cells
Ditlev, Jonathon A.; Michalski, Paul J.; Huber, Greg; Rivera, Gonzalo M.; Mohler, William A.
2012-01-01
Regulation of actin dynamics through the Nck/N-WASp (neural Wiskott–Aldrich syndrome protein)/Arp2/3 pathway is essential for organogenesis, cell invasiveness, and pathogen infection. Although many of the proteins involved in this pathway are known, the detailed mechanism by which it functions remains undetermined. To examine the signaling mechanism, we used a two-pronged strategy involving computational modeling and quantitative experimentation. We developed predictions for Nck-dependent actin polymerization using the Virtual Cell software system. In addition, we used antibody-induced aggregation of membrane-targeted Nck SH3 domains to test these predictions and to determine how the number of molecules in Nck aggregates and the density of aggregates affected localized actin polymerization in living cells. Our results indicate that the density of Nck molecules in aggregates is a critical determinant of actin polymerization. Furthermore, results from both computational simulations and experimentation support a model in which the Nck/N-WASp/Arp2/3 stoichiometry is 4:2:1. These results provide new insight into activities involving localized actin polymerization, including tumor cell invasion, microbial pathogenesis, and T cell activation. PMID:22613834
mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-10-07
Knowing the subcellular compartments of human proteins is essential to shed light on the mechanisms of a broad range of human diseases. In computational methods for protein subcellular localization, knowledge-based methods (especially gene ontology (GO) based methods) are known to perform better than sequence-based methods. However, existing GO-based predictors often lack interpretability and suffer from overfitting due to the high dimensionality of feature vectors. To address these problems, this paper proposes an interpretable multi-label predictor, namely mLASSO-Hum, which can yield sparse and interpretable solutions for large-scale prediction of human protein subcellular localization. By using the one-vs-rest LASSO-based classifiers, 87 out of more than 8000 GO terms are found to play more significant roles in determining the subcellular localization. Based on these 87 essential GO terms, we can decide not only where a protein resides within a cell, but also why it is located there. To further exploit information from the remaining GO terms, a method based on the GO hierarchical information derived from the depth distance of GO terms is proposed. Experimental results show that mLASSO-Hum performs significantly better than state-of-the-art predictors. We also found that in addition to the GO terms from the cellular component category, GO terms from the other two categories also play important roles in the final classification decisions. For readers' convenience, the mLASSO-Hum server is available online at http://bioinfo.eie.polyu.edu.hk/mLASSOHumServer/. Copyright © 2015 Elsevier Ltd. All rights reserved.
Clock, Sarah A; Planet, Paul J; Perez, Brenda A; Figurski, David H
2008-02-01
Prokaryotic secretion relies on proteins that are widely conserved, including NTPases and secretins, and on proteins that are system specific. The Tad secretion system in Aggregatibacter actinomycetemcomitans is dedicated to the assembly and export of Flp pili, which are needed for tight adherence. Consistent with predictions that RcpA forms the multimeric outer membrane secretion channel (secretin) of the Flp pilus biogenesis apparatus, we observed the RcpA protein in multimers that were stable in the presence of detergent and found that rcpA and its closely related homologs form a novel and distinct subfamily within a well-supported gene phylogeny of the entire secretin gene superfamily. We also found that rcpA-like genes were always linked to Aggregatibacter rcpB- or Caulobacter cpaD-like genes. Using antisera, we determined the localization and gross abundances of conserved (RcpA and TadC) and unique (RcpB, RcpC, and TadD) Tad proteins. The three Rcp proteins (RcpA, RcpB, and RcpC) and TadD, a putative lipoprotein, localized to the bacterial outer membrane. RcpA, RcpC, and TadD were also found in the inner membrane, while TadC localized exclusively to the inner membrane. The RcpA secretin was necessary for wild-type abundances of RcpB and RcpC, and TadC was required for normal levels of all three Rcp proteins. TadC abundance defects were observed in rcpA and rcpC mutants. TadD production was essential for wild-type RcpA and RcpB abundances, and RcpA did not multimerize or localize to the outer membrane without the expression of TadD. These data indicate that membrane proteins TadC and TadD may influence the assembly, transport, and/or function of individual outer membrane Rcp proteins.
Akhter, Nasrin; Shehu, Amarda
2018-01-19
Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.
A Quantitative Spatial Proteomics Analysis of Proteome Turnover in Human Cells*
Boisvert, François-Michel; Ahmad, Yasmeen; Gierliński, Marek; Charrière, Fabien; Lamont, Douglas; Scott, Michelle; Barton, Geoff; Lamond, Angus I.
2012-01-01
Measuring the properties of endogenous cell proteins, such as expression level, subcellular localization, and turnover rates, on a whole proteome level remains a major challenge in the postgenome era. Quantitative methods for measuring mRNA expression do not reliably predict corresponding protein levels and provide little or no information on other protein properties. Here we describe a combined pulse-labeling, spatial proteomics and data analysis strategy to characterize the expression, localization, synthesis, degradation, and turnover rates of endogenously expressed, untagged human proteins in different subcellular compartments. Using quantitative mass spectrometry and stable isotope labeling with amino acids in cell culture, a total of 80,098 peptides from 8,041 HeLa proteins were quantified, and their spatial distribution between the cytoplasm, nucleus and nucleolus determined and visualized using specialized software tools developed in PepTracker. Using information from ion intensities and rates of change in isotope ratios, protein abundance levels and protein synthesis, degradation and turnover rates were calculated for the whole cell and for the respective cytoplasmic, nuclear, and nucleolar compartments. Expression levels of endogenous HeLa proteins varied by up to seven orders of magnitude. The average turnover rate for HeLa proteins was ∼20 h. Turnover rate did not correlate with either molecular weight or net charge, but did correlate with abundance, with highly abundant proteins showing longer than average half-lives. Fast turnover proteins had overall a higher frequency of PEST motifs than slow turnover proteins but no general correlation was observed between amino or carboxyl terminal amino acid identities and turnover rates. A subset of proteins was identified that exist in pools with different turnover rates depending on their subcellular localization. This strongly correlated with subunits of large, multiprotein complexes, suggesting a general mechanism whereby their assembly is controlled in a different subcellular location to their main site of function. PMID:21937730
Gustine, David D.; Barboza, Perry S.; Lawler, James P.; Arthur, Stephen M.; Shults, Brad S.; Persons, Kate; Adams, Layne G.
2011-01-01
Identifying links between nutritional condition of individuals and population trajectories greatly enhances our understanding of the ecology, conservation, and management of wildlife. For northern ungulates, the potential impacts of a changing climate to populations are predicted to be nutritionally mediated through an increase in the severity and variance in winter conditions. Foraging conditions and the availability of body protein as a store for reproduction in late winter may constrain productivity in northern ungulates, yet the link between characteristics of wintering habitats and protein status has not been established for a wild ungulate. We used a non‐invasive proxy of protein status derived from isotopes of N in excreta to evaluate the influence of winter habitats on the protein status of muskoxen in three populations in Alaska (2005–2008). Multiple regression and an information‐theoretic approach were used to compare models that evaluated the influence of population, year, and characteristics of foraging sites (components of diet and physiography) on protein status for groups of muskoxen. The observed variance in protein status among groups of muskoxen across populations and years was partially explained (45%) by local foraging conditions that affected forage availability. Protein status improved for groups of muskoxen as the amount of graminoids in the diet increased (−0.430 ± 0.31, β± 95% CI) and elevation of foraging sites decreased (0.824 ± 0.67). Resources available for reproduction in muskoxen are highly dependent upon demographic, environmental, and physiographic constraints that affect forage availability in winter. Due to their very sedentary nature in winter, muskoxen are highly susceptible to localized foraging conditions; therefore, the spatial variance in resource availability may exert a strong effect on productivity. Consequently, there is a clear need to account for climate–topography effects in winter at multiple scales when predicting the potential impacts of climatic shifts on population trajectories of muskoxen.
Evidence for Loss of a Partial Flagellar Glycolytic Pathway during Trypanosomatid Evolution
Brown, Robert W. B.; Collingridge, Peter W.; Gull, Keith; Rigden, Daniel J.; Ginger, Michael L.
2014-01-01
Classically viewed as a cytosolic pathway, glycolysis is increasingly recognized as a metabolic pathway exhibiting surprisingly wide-ranging variations in compartmentalization within eukaryotic cells. Trypanosomatid parasites provide an extreme view of glycolytic enzyme compartmentalization as several glycolytic enzymes are found exclusively in peroxisomes. Here, we characterize Trypanosoma brucei flagellar proteins resembling glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and phosphoglycerate kinase (PGK): we show the latter associates with the axoneme and the former is a novel paraflagellar rod component. The paraflagellar rod is an essential extra-axonemal structure in trypanosomes and related protists, providing a platform into which metabolic activities can be built. Yet, bioinformatics interrogation and structural modelling indicate neither the trypanosome PGK-like nor the GAPDH-like protein is catalytically active. Orthologs are present in a free-living ancestor of the trypanosomatids, Bodo saltans: the PGK-like protein from B. saltans also lacks key catalytic residues, but its GAPDH-like protein is predicted to be catalytically competent. We discuss the likelihood that the trypanosome GAPDH-like and PGK-like proteins constitute molecular evidence for evolutionary loss of a flagellar glycolytic pathway, either as a consequence of niche adaptation or the re-localization of glycolytic enzymes to peroxisomes and the extensive changes to glycolytic flux regulation that accompanied this re-localization. Evidence indicating loss of localized ATP provision via glycolytic enzymes therefore provides a novel contribution to an emerging theme of hidden diversity with respect to compartmentalization of the ubiquitous glycolytic pathway in eukaryotes. A possibility that trypanosome GAPDH-like protein additionally represents a degenerate example of a moonlighting protein is also discussed. PMID:25050549
Position-dependent effects of polylysine on Sec protein transport.
Liang, Fu-Cheng; Bageshwar, Umesh K; Musser, Siegfried M
2012-04-13
The bacterial Sec protein translocation system catalyzes the transport of unfolded precursor proteins across the cytoplasmic membrane. Using a recently developed real time fluorescence-based transport assay, the effects of the number and distribution of positive charges on the transport time and transport efficiency of proOmpA were examined. As expected, an increase in the number of lysine residues generally increased transport time and decreased transport efficiency. However, the observed effects were highly dependent on the polylysine position in the mature domain. In addition, a string of consecutive positive charges generally had a more significant effect on transport time and efficiency than separating the charges into two or more charged segments. Thirty positive charges distributed throughout the mature domain resulted in effects similar to 10 consecutive charges near the N terminus of the mature domain. These data support a model in which the local effects of positive charge on the translocation kinetics dominate over total thermodynamic constraints. The rapid translocation kinetics of some highly charged proOmpA mutants suggest that the charge is partially shielded from the electric field gradient during transport, possibly by the co-migration of counter ions. The transport times of precursors with multiple positively charged sequences, or "pause sites," were fairly well predicted by a local effect model. However, the kinetic profile predicted by this local effect model was not observed. Instead, the transport kinetics observed for precursors with multiple polylysine segments support a model in which translocation through the SecYEG pore is not the rate-limiting step of transport.
Position-dependent Effects of Polylysine on Sec Protein Transport*
Liang, Fu-Cheng; Bageshwar, Umesh K.; Musser, Siegfried M.
2012-01-01
The bacterial Sec protein translocation system catalyzes the transport of unfolded precursor proteins across the cytoplasmic membrane. Using a recently developed real time fluorescence-based transport assay, the effects of the number and distribution of positive charges on the transport time and transport efficiency of proOmpA were examined. As expected, an increase in the number of lysine residues generally increased transport time and decreased transport efficiency. However, the observed effects were highly dependent on the polylysine position in the mature domain. In addition, a string of consecutive positive charges generally had a more significant effect on transport time and efficiency than separating the charges into two or more charged segments. Thirty positive charges distributed throughout the mature domain resulted in effects similar to 10 consecutive charges near the N terminus of the mature domain. These data support a model in which the local effects of positive charge on the translocation kinetics dominate over total thermodynamic constraints. The rapid translocation kinetics of some highly charged proOmpA mutants suggest that the charge is partially shielded from the electric field gradient during transport, possibly by the co-migration of counter ions. The transport times of precursors with multiple positively charged sequences, or “pause sites,” were fairly well predicted by a local effect model. However, the kinetic profile predicted by this local effect model was not observed. Instead, the transport kinetics observed for precursors with multiple polylysine segments support a model in which translocation through the SecYEG pore is not the rate-limiting step of transport. PMID:22367204
A hidden markov model derived structural alphabet for proteins.
Camproux, A C; Gautier, R; Tufféry, P
2004-06-04
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
GenProBiS: web server for mapping of sequence variants to protein binding sites.
Konc, Janez; Skrlj, Blaz; Erzen, Nika; Kunej, Tanja; Janezic, Dusanka
2017-07-03
Discovery of potentially deleterious sequence variants is important and has wide implications for research and generation of new hypotheses in human and veterinary medicine, and drug discovery. The GenProBiS web server maps sequence variants to protein structures from the Protein Data Bank (PDB), and further to protein-protein, protein-nucleic acid, protein-compound, and protein-metal ion binding sites. The concept of a protein-compound binding site is understood in the broadest sense, which includes glycosylation and other post-translational modification sites. Binding sites were defined by local structural comparisons of whole protein structures using the Protein Binding Sites (ProBiS) algorithm and transposition of ligands from the similar binding sites found to the query protein using the ProBiS-ligands approach with new improvements introduced in GenProBiS. Binding site surfaces were generated as three-dimensional grids encompassing the space occupied by predicted ligands. The server allows intuitive visual exploration of comprehensively mapped variants, such as human somatic mis-sense mutations related to cancer and non-synonymous single nucleotide polymorphisms from 21 species, within the predicted binding sites regions for about 80 000 PDB protein structures using fast WebGL graphics. The GenProBiS web server is open and free to all users at http://genprobis.insilab.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Slocum, Joshua D; First, Jeremy T; Webb, Lauren J
2017-07-20
Measurement of the magnitude, direction, and functional importance of electric fields in biomolecules has been a long-standing experimental challenge. pK a shifts of titratable residues have been the most widely implemented measurements of the local electrostatic environment around the labile proton, and experimental data sets of pK a shifts in a variety of systems have been used to test and refine computational prediction capabilities of protein electrostatic fields. A more direct and increasingly popular technique to measure electric fields in proteins is Stark effect spectroscopy, where the change in absorption energy of a chromophore relative to a reference state is related to the change in electric field felt by the chromophore. While there are merits to both of these methods and they are both reporters of local electrostatic environment, they are fundamentally different measurements, and to our knowledge there has been no direct comparison of these two approaches in a single protein. We have recently demonstrated that green fluorescent protein (GFP) is an ideal model system for measuring changes in electric fields in a protein interior caused by amino acid mutations using both electronic and vibrational Stark effect chromophores. Here we report the changes in pK a of the GFP fluorophore in response to the same mutations and show that they are in excellent agreement with Stark effect measurements. This agreement in the results of orthogonal experiments reinforces our confidence in the experimental results of both Stark effect and pK a measurements and provides an excellent target data set to benchmark diverse protein electrostatics calculations. We used this experimental data set to test the pK a prediction ability of the adaptive Poisson-Boltzmann solver (APBS) and found that a simple continuum dielectric model of the GFP interior is insufficient to accurately capture the measured pK a and Stark effect shifts. We discuss some of the limitations of this continuum-based model in this system and offer this experimentally self-consistent data set as a target benchmark for electrostatics models, which could allow for a more rigorous test of pK a prediction techniques due to the unique environment of the water-filled GFP barrel compared to traditional globular proteins.
Shatabda, Swakkhar; Saha, Sanjay; Sharma, Alok; Dehzangi, Abdollah
2017-12-21
Bacteriophage proteins are viruses that can significantly impact on the functioning of bacteria and can be used in phage based therapy. The functioning of Bacteriophage in the host bacteria depends on its location in those host cells. It is very important to know the subcellular location of the phage proteins in a host cell in order to understand their working mechanism. In this paper, we propose iPHLoc-ES, a prediction method for subcellular localization of bacteriophage proteins. We aim to solve two problems: discriminating between host located and non-host located phage proteins and discriminating between the locations of host located protein in a host cell (membrane or cytoplasm). To do this, we extract sets of evolutionary and structural features of phage protein and employ Support Vector Machine (SVM) as our classifier. We also use recursive feature elimination (RFE) to reduce the number of features for effective prediction. On standard dataset using standard evaluation criteria, our method significantly outperforms the state-of-the-art predictor. iPHLoc-ES is readily available to use as a standalone tool from: https://github.com/swakkhar/iPHLoc-ES/ and as a web application from: http://brl.uiu.ac.bd/iPHLoc-ES/. Copyright © 2017 Elsevier Ltd. All rights reserved.
iCLIP Predicts the Dual Splicing Effects of TIA-RNA Interactions
Briese, Michael; Zarnack, Kathi; Luscombe, Nicholas M.; Rot, Gregor; Zupan, Blaž; Curk, Tomaž; Ule, Jernej
2010-01-01
The regulation of alternative splicing involves interactions between RNA-binding proteins and pre-mRNA positions close to the splice sites. T-cell intracellular antigen 1 (TIA1) and TIA1-like 1 (TIAL1) locally enhance exon inclusion by recruiting U1 snRNP to 5′ splice sites. However, effects of TIA proteins on splicing of distal exons have not yet been explored. We used UV-crosslinking and immunoprecipitation (iCLIP) to find that TIA1 and TIAL1 bind at the same positions on human RNAs. Binding downstream of 5′ splice sites was used to predict the effects of TIA proteins in enhancing inclusion of proximal exons and silencing inclusion of distal exons. The predictions were validated in an unbiased manner using splice-junction microarrays, RT-PCR, and minigene constructs, which showed that TIA proteins maintain splicing fidelity and regulate alternative splicing by binding exclusively downstream of 5′ splice sites. Surprisingly, TIA binding at 5′ splice sites silenced distal cassette and variable-length exons without binding in proximity to the regulated alternative 3′ splice sites. Using transcriptome-wide high-resolution mapping of TIA-RNA interactions we evaluated the distal splicing effects of TIA proteins. These data are consistent with a model where TIA proteins shorten the time available for definition of an alternative exon by enhancing recognition of the preceding 5′ splice site. Thus, our findings indicate that changes in splicing kinetics could mediate the distal regulation of alternative splicing. PMID:21048981
Rapid evolution of cis-regulatory sequences via local point mutations
NASA Technical Reports Server (NTRS)
Stone, J. R.; Wray, G. A.
2001-01-01
Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.
Mutations in CSPP1 lead to classical Joubert syndrome.
Akizu, Naiara; Silhavy, Jennifer L; Rosti, Rasim Ozgur; Scott, Eric; Fenstermaker, Ali G; Schroth, Jana; Zaki, Maha S; Sanchez, Henry; Gupta, Neerja; Kabra, Madhulika; Kara, Majdi; Ben-Omran, Tawfeg; Rosti, Basak; Guemez-Gamboa, Alicia; Spencer, Emily; Pan, Roger; Cai, Na; Abdellateef, Mostafa; Gabriel, Stacey; Halbritter, Jan; Hildebrandt, Friedhelm; van Bokhoven, Hans; Gunel, Murat; Gleeson, Joseph G
2014-01-02
Joubert syndrome and related disorders (JSRDs) are genetically heterogeneous and characterized by a distinctive mid-hindbrain malformation. Causative mutations lead to primary cilia dysfunction, which often results in variable involvement of other organs such as the liver, retina, and kidney. We identified predicted null mutations in CSPP1 in six individuals affected by classical JSRDs. CSPP1 encodes a protein localized to centrosomes and spindle poles, as well as to the primary cilium. Despite the known interaction between CSPP1 and nephronophthisis-associated proteins, none of the affected individuals in our cohort presented with kidney disease, and further, screening of a large cohort of individuals with nephronophthisis demonstrated no mutations. CSPP1 is broadly expressed in neural tissue, and its encoded protein localizes to the primary cilium in an in vitro model of human neurogenesis. Here, we show abrogated protein levels and ciliogenesis in affected fibroblasts. Our data thus suggest that CSPP1 is involved in neural-specific functions of primary cilia. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Automated antibody structure prediction using Accelrys tools: Results and best practices
Fasnacht, Marc; Butenhof, Ken; Goupil-Lamy, Anne; Hernandez-Guzman, Francisco; Huang, Hongwei; Yan, Lisa
2014-01-01
We describe the methodology and results from our participation in the second Antibody Modeling Assessment experiment. During the experiment we predicted the structure of eleven unpublished antibody Fv fragments. Our prediction methods centered on template-based modeling; potential templates were selected from an antibody database based on their sequence similarity to the target in the framework regions. Depending on the quality of the templates, we constructed models of the antibody framework regions either using a single, chimeric or multiple template approach. The hypervariable loop regions in the initial models were rebuilt by grafting the corresponding regions from suitable templates onto the model. For the H3 loop region, we further refined models using ab initio methods. The final models were subjected to constrained energy minimization to resolve severe local structural problems. The analysis of the models submitted show that Accelrys tools allow for the construction of quite accurate models for the framework and the canonical CDR regions, with RMSDs to the X-ray structure on average below 1 Å for most of these regions. The results show that accurate prediction of the H3 hypervariable loops remains a challenge. Furthermore, model quality assessment of the submitted models show that the models are of quite high quality, with local geometry assessment scores similar to that of the target X-ray structures. Proteins 2014; 82:1583–1598. © 2014 The Authors. Proteins published by Wiley Periodicals, Inc. PMID:24833271
Proteomic Approaches Identify Members of Cofilin Pathway Involved in Oral Tumorigenesis
Polachini, Giovana M.; Sobral, Lays M.; Mercante, Ana M. C.; Paes-Leme, Adriana F.; Xavier, Flávia C. A.; Henrique, Tiago; Guimarães, Douglas M.; Vidotto, Alessandra; Fukuyama, Erica E.; Góis-Filho, José F.; Cury, Patricia M.; Curioni, Otávio A.; Michaluart Jr, Pedro; Silva, Adriana M. A.; Wünsch-Filho, Victor; Nunes, Fabio D.; Leopoldino, Andréia M.; Tajara, Eloiza H.
2012-01-01
The prediction of tumor behavior for patients with oral carcinomas remains a challenge for clinicians. The presence of lymph node metastasis is the most important prognostic factor but it is limited in predicting local relapse or survival. This highlights the need for identifying biomarkers that may effectively contribute to prediction of recurrence and tumor spread. In this study, we used one- and two-dimensional gel electrophoresis, mass spectrometry and immunodetection methods to analyze protein expression in oral squamous cell carcinomas. Using a refinement for classifying oral carcinomas in regard to prognosis, we analyzed small but lymph node metastasis-positive versus large, lymph node metastasis-negative tumors in order to contribute to the molecular characterization of subgroups with risk of dissemination. Specific protein patterns favoring metastasis were observed in the “more-aggressive” group defined by the present study. This group displayed upregulation of proteins involved in migration, adhesion, angiogenesis, cell cycle regulation, anti-apoptosis and epithelial to mesenchymal transition, whereas the “less-aggressive” group was engaged in keratinocyte differentiation, epidermis development, inflammation and immune response. Besides the identification of several proteins not yet described as deregulated in oral carcinomas, the present study demonstrated for the first time the role of cofilin-1 in modulating cell invasion in oral carcinomas. PMID:23227181
Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri
2003-01-01
One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523
Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri
2003-08-01
One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.
Single-molecule Protein Unfolding in Solid State Nanopores
Talaga, David S.; Li, Jiali
2009-01-01
We use single silicon nitride nanopores to study folded, partially folded and unfolded single proteins by measuring their excluded volumes. The DNA-calibrated translocation signals of β-lactoglobulin and histidine-containing phosphocarrier protein match quantitatively with that predicted by a simple sum of the partial volumes of the amino acids in the polypeptide segment inside the pore when translocation stalls due to the primary charge sequence. Our analysis suggests that the majority of the protein molecules were linear or looped during translocation and that the electrical forces present under physiologically relevant potentials can unfold proteins. Our results show that the nanopore translocation signals are sensitive enough to distinguish the folding state of a protein and distinguish between proteins based on the excluded volume of a local segment of the polypeptide chain that transiently stalls in the nanopore due to the primary sequence of charges. PMID:19530678
DeepSig: deep learning improves signal peptide detection in proteins.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita
2018-05-15
The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.
Gustavsson, N.; Härndahl, U.; Emanuelsson, A.; Roepstorff, P.; Sundby, C.
1999-01-01
The small heat shock proteins (sHsps), which counteract heat and oxidative stress in an unknown way, belong to a protein family of sHsps and alpha-crystallins whose members form large oligomeric complexes. The chloroplast-localized sHsp, Hsp21, contains a conserved methionine-rich sequence, predicted to form an amphipatic helix with the methionines situated along one of its sides. Here, we report how methionine sulfoxidation was detected by mass spectrometry in proteolytically cleaved peptides that were produced from recombinant Arabidopsis thaliana Hsp21, which had been treated with varying concentrations of hydrogen peroxide. Sulfoxidation of the methionine residues in the conserved amphipatic helix coincided with a significant conformational change in the Hsp21 protein oligomer. PMID:10595556
OST-HTH: a novel predicted RNA-binding domain
2010-01-01
Background The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria. Results Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Conclusions Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs. Reviewers This article was reviewed by Sandor Pongor and Arcady Mushegian. PMID:20302647
Takamitsu, Emi; Otsuka, Motoaki; Haebara, Tatsuki; Yano, Manami; Matsuzaki, Kanako; Kobuchi, Hirotsugu; Moriya, Koko; Utsumi, Toshihiko
2015-01-01
To identify physiologically important human N-myristoylated proteins, 90 cDNA clones predicted to encode human N-myristoylated proteins were selected from a human cDNA resource (4,369 Kazusa ORFeome project human cDNA clones) by two bioinformatic N-myristoylation prediction systems, NMT-The MYR Predictor and Myristoylator. After database searches to exclude known human N-myristoylated proteins, 37 cDNA clones were selected as potential human N-myristoylated proteins. The susceptibility of these cDNA clones to protein N-myristoylation was first evaluated using fusion proteins in which the N-terminal ten amino acid residues were fused to an epitope-tagged model protein. Then, protein N-myristoylation of the gene products of full-length cDNAs was evaluated by metabolic labeling experiments both in an insect cell-free protein synthesis system and in transfected human cells. As a result, the products of 13 cDNA clones (FBXL7, PPM1B, SAMM50, PLEKHN, AIFM3, C22orf42, STK32A, FAM131C, DRICH1, MCC1, HID1, P2RX5, STK32B) were found to be human N-myristoylated proteins. Analysis of the role of protein N-myristoylation on the intracellular localization of SAMM50, a mitochondrial outer membrane protein, revealed that protein N-myristoylation was required for proper targeting of SAMM50 to mitochondria. Thus, the strategy used in this study is useful for the identification of physiologically important human N-myristoylated proteins from human cDNA resources.
Takamitsu, Emi; Otsuka, Motoaki; Haebara, Tatsuki; Yano, Manami; Matsuzaki, Kanako; Kobuchi, Hirotsugu; Moriya, Koko; Utsumi, Toshihiko
2015-01-01
To identify physiologically important human N-myristoylated proteins, 90 cDNA clones predicted to encode human N-myristoylated proteins were selected from a human cDNA resource (4,369 Kazusa ORFeome project human cDNA clones) by two bioinformatic N-myristoylation prediction systems, NMT-The MYR Predictor and Myristoylator. After database searches to exclude known human N-myristoylated proteins, 37 cDNA clones were selected as potential human N-myristoylated proteins. The susceptibility of these cDNA clones to protein N-myristoylation was first evaluated using fusion proteins in which the N-terminal ten amino acid residues were fused to an epitope-tagged model protein. Then, protein N-myristoylation of the gene products of full-length cDNAs was evaluated by metabolic labeling experiments both in an insect cell-free protein synthesis system and in transfected human cells. As a result, the products of 13 cDNA clones (FBXL7, PPM1B, SAMM50, PLEKHN, AIFM3, C22orf42, STK32A, FAM131C, DRICH1, MCC1, HID1, P2RX5, STK32B) were found to be human N-myristoylated proteins. Analysis of the role of protein N-myristoylation on the intracellular localization of SAMM50, a mitochondrial outer membrane protein, revealed that protein N-myristoylation was required for proper targeting of SAMM50 to mitochondria. Thus, the strategy used in this study is useful for the identification of physiologically important human N-myristoylated proteins from human cDNA resources. PMID:26308446
Hooper, Cornelia M; Castleden, Ian R; Aryamanesh, Nader; Jacoby, Richard P; Millar, A Harvey
2016-01-01
Barley, wheat, rice and maize provide the bulk of human nutrition and have extensive industrial use as agricultural products. The genomes of these crops each contains >40,000 genes encoding proteins; however, the major genome databases for these species lack annotation information of protein subcellular location for >80% of these gene products. We address this gap, by constructing the compendium of crop protein subcellular locations called crop Proteins with Annotated Locations (cropPAL). Subcellular location is most commonly determined by fluorescent protein tagging of live cells or mass spectrometry detection in subcellular purifications, but can also be predicted from amino acid sequence or protein expression patterns. The cropPAL database collates 556 published studies, from >300 research institutes in >30 countries that have been previously published, as well as compiling eight pre-computed subcellular predictions for all Hordeum vulgare, Triticum aestivum, Oryza sativa and Zea mays protein sequences. The data collection including metadata for proteins and published studies can be accessed through a search portal http://crop-PAL.org. The subcellular localization information housed in cropPAL helps to depict plant cells as compartmentalized protein networks that can be investigated for improving crop yield and quality, and developing new biotechnological solutions to agricultural challenges. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Morino, Kazuko; Kimizu, Mayumi; Fujiwara, Masayuki
2016-01-01
Reactive oxygen species (ROS) production is an early event in the immune response of plants. ROS production affects the redox-based modification of cysteine residues in redox proteins, which contribute to protein functions such as enzymatic activity, protein-protein interactions, oligomerization, and intracellular localization. Thus, the sensitivity of cysteine residues to changes in the cellular redox status is critical to the immune response of plants. We used disulfide proteomics to identify immune response-related redox proteins. Total protein was extracted from rice cultured cells expressing constitutively active or dominant-negative OsRacl, which is a key regulator of the immune response in rice, and from rice cultured cells that were treated with probenazole, which is an activator of the plant immune response, in the presence of the thiol group-specific fluorescent probe monobromobimane (mBBr), which was a tag for reduced proteins in a differential display two-dimensional gel electrophoresis. The mBBr fluorescence was detected by using a charge-coupled device system, and total protein spots were detected using Coomassie brilliant blue staining. Both of the protein spots were analyzed by gel image software and identified using MS spectrometry. The possible disulfide bonds were identified using the disulfide bond prediction software. Subcellular localization and bimolecular fluorescence complementation analysis were performed in one of the identified proteins: Oryza sativa cold shock protein 2 (OsCSP2). We identified seven proteins carrying potential redox-sensitive cysteine residues. Two proteins of them were oxidized in cultured cells expressing DN-OsRac1, which indicates that these two proteins would be inactivated through the inhibition of OsRac1 signaling pathway. One of the two oxidized proteins, OsCSP2, contains 197 amino acid residues and six cysteine residues. Site-directed mutagenesis of these cysteine residues revealed that a Cys 140 mutation causes mislocalization of a green fluorescent protein fusion protein in the root cells of rice. Bimolecular fluorescence complementation analysis revealed that OsCSP2 is localized in the nucleus as a homo dimer in rice root cells. The findings of the study indicate that redox-sensitive cysteine modification would contribute to the immune response in rice.
Molecular properties of the N-terminal extension of the fission yeast kinesin-5, Cut7.
Edamatsu, M
2016-02-11
Kinesin-5 plays an essential role in spindle formation and function, and serves as a potential target for anti-cancer drugs. The aim of this study was to elucidate the molecular properties of the N-terminal extension of the Schizosaccharomyces pombe kinesin-5, Cut7. This extension is rich in charged amino acids and predicted to be intrinsically disordered. In S. pombe cells, a Cut7 construct lacking half the N-terminal extension failed to localize along the spindle microtubules and formed a monopolar spindle. However, a construct lacking the entire N-terminal extension exhibited normal localization and formed a typical bipolar spindle. In addition, in vitro analyses revealed that the truncated Cut7 constructs demonstrated similar motile velocities and directionalities as the wild-type motor protein, but the microtubule landing rates were significantly reduced. These findings suggest that the N-terminal extension is not required for normal Cut7 intracellular localization or function, but alters the microtubule-binding properties of this protein in vitro.
Protein-protein docking using region-based 3D Zernike descriptors
2009-01-01
Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235
Protein-protein docking using region-based 3D Zernike descriptors.
Venkatraman, Vishwesh; Yang, Yifeng D; Sael, Lee; Kihara, Daisuke
2009-12-09
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-alphaRMSD < or = 2.5 A) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Moraxella catarrhalis synthesizes an autotransporter that is an acid phosphatase.
Hoopman, Todd C; Wang, Wei; Brautigam, Chad A; Sedillo, Jennifer L; Reilly, Thomas J; Hansen, Eric J
2008-02-01
Moraxella catarrhalis O35E was shown to synthesize a 105-kDa protein that has similarity to both acid phosphatases and autotransporters. The N-terminal portion of the M. catarrhalis acid phosphatase A (MapA) was most similar (the BLAST probability score was 10(-10)) to bacterial class A nonspecific acid phosphatases. The central region of the MapA protein had similarity to passenger domains of other autotransporter proteins, whereas the C-terminal portion of MapA resembled the translocation domain of conventional autotransporters. Cloning and expression of the M. catarrhalis mapA gene in Escherichia coli confirmed the presence of acid phosphatase activity in the MapA protein. The MapA protein was shown to be localized to the outer membrane of M. catarrhalis and was not detected either in the soluble cytoplasmic fraction from disrupted M. catarrhalis cells or in the spent culture supernatant fluid from M. catarrhalis. Use of the predicted MapA translocation domain in a fusion construct with the passenger domain from another predicted M. catarrhalis autotransporter confirmed the translocation ability of this MapA domain. Inactivation of the mapA gene in M. catarrhalis strain O35E reduced the acid phosphatase activity expressed by this organism, and this mutation could be complemented in trans with the wild-type mapA gene. Nucleotide sequence analysis of the mapA gene from six M. catarrhalis strains showed that this protein was highly conserved among strains of this pathogen. Site-directed mutagenesis of a critical histidine residue (H233A) in the predicted active site of the acid phosphatase domain in MapA eliminated acid phosphatase activity in the recombinant MapA protein. This is the first description of an autotransporter protein that expresses acid phosphatase activity.
Moraxella catarrhalis Synthesizes an Autotransporter That Is an Acid Phosphatase▿
Hoopman, Todd C.; Wang, Wei; Brautigam, Chad A.; Sedillo, Jennifer L.; Reilly, Thomas J.; Hansen, Eric J.
2008-01-01
Moraxella catarrhalis O35E was shown to synthesize a 105-kDa protein that has similarity to both acid phosphatases and autotransporters. The N-terminal portion of the M. catarrhalis acid phosphatase A (MapA) was most similar (the BLAST probability score was 10−10) to bacterial class A nonspecific acid phosphatases. The central region of the MapA protein had similarity to passenger domains of other autotransporter proteins, whereas the C-terminal portion of MapA resembled the translocation domain of conventional autotransporters. Cloning and expression of the M. catarrhalis mapA gene in Escherichia coli confirmed the presence of acid phosphatase activity in the MapA protein. The MapA protein was shown to be localized to the outer membrane of M. catarrhalis and was not detected either in the soluble cytoplasmic fraction from disrupted M. catarrhalis cells or in the spent culture supernatant fluid from M. catarrhalis. Use of the predicted MapA translocation domain in a fusion construct with the passenger domain from another predicted M. catarrhalis autotransporter confirmed the translocation ability of this MapA domain. Inactivation of the mapA gene in M. catarrhalis strain O35E reduced the acid phosphatase activity expressed by this organism, and this mutation could be complemented in trans with the wild-type mapA gene. Nucleotide sequence analysis of the mapA gene from six M. catarrhalis strains showed that this protein was highly conserved among strains of this pathogen. Site-directed mutagenesis of a critical histidine residue (H233A) in the predicted active site of the acid phosphatase domain in MapA eliminated acid phosphatase activity in the recombinant MapA protein. This is the first description of an autotransporter protein that expresses acid phosphatase activity. PMID:18065547
Liu, Tingli; Ye, Wenwu; Ru, Yanyan; Yang, Xinyu; Gu, Biao; Tao, Kai; Lu, Shan; Dong, Suomeng; Zheng, Xiaobo; Shan, Weixing; Wang, Yuanchao; Dou, Daolong
2011-01-01
Phytophthora sojae encodes hundreds of putative host cytoplasmic effectors with conserved FLAK motifs following signal peptides, termed crinkling- and necrosis-inducing proteins (CRN) or Crinkler. Their functions and mechanisms in pathogenesis are mostly unknown. Here, we identify a group of five P. sojae-specific CRN-like genes with high levels of sequence similarity, of which three are putative pseudogenes. Functional analysis shows that the two functional genes encode proteins with predicted nuclear localization signals that induce contrasting responses when expressed in Nicotiana benthamiana and soybean (Glycine max). PsCRN63 induces cell death, while PsCRN115 suppresses cell death elicited by the P. sojae necrosis-inducing protein (PsojNIP) or PsCRN63. Expression of CRN fragments with deleted signal peptides and FLAK motifs demonstrates that the carboxyl-terminal portions of PsCRN63 or PsCRN115 are sufficient for their activities. However, the predicted nuclear localization signal is required for PsCRN63 to induce cell death but not for PsCRN115 to suppress cell death. Furthermore, silencing of the PsCRN63 and PsCRN115 genes in P. sojae stable transformants leads to a reduction of virulence on soybean. Intriguingly, the silenced transformants lose the ability to suppress host cell death and callose deposition on inoculated plants. These results suggest a role for CRN effectors in the suppression of host defense responses.
ModFOLD6: an accurate web server for the global and local quality estimation of 3D protein models.
Maghrabi, Ali H A; McGuffin, Liam J
2017-07-03
Methods that reliably estimate the likely similarity between the predicted and native structures of proteins have become essential for driving the acceptance and adoption of three-dimensional protein models by life scientists. ModFOLD6 is the latest version of our leading resource for Estimates of Model Accuracy (EMA), which uses a pioneering hybrid quasi-single model approach. The ModFOLD6 server integrates scores from three pure-single model methods and three quasi-single model methods using a neural network to estimate local quality scores. Additionally, the server provides three options for producing global score estimates, depending on the requirements of the user: (i) ModFOLD6_rank, which is optimized for ranking/selection, (ii) ModFOLD6_cor, which is optimized for correlations of predicted and observed scores and (iii) ModFOLD6 global for balanced performance. The ModFOLD6 methods rank among the top few for EMA, according to independent blind testing by the CASP12 assessors. The ModFOLD6 server is also continuously automatically evaluated as part of the CAMEO project, where significant performance gains have been observed compared to our previous server and other publicly available servers. The ModFOLD6 server is freely available at: http://www.reading.ac.uk/bioinf/ModFOLD/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Binding site and affinity prediction of general anesthetics to protein targets using docking.
Liu, Renyu; Perez-Aguilar, Jose Manuel; Liang, David; Saven, Jeffery G
2012-05-01
The protein targets for general anesthetics remain unclear. A tool to predict anesthetic binding for potential binding targets is needed. In this study, we explored whether a computational method, AutoDock, could serve as such a tool. High-resolution crystal data of water-soluble proteins (cytochrome C, apoferritin, and human serum albumin), and a membrane protein (a pentameric ligand-gated ion channel from Gloeobacter violaceus [GLIC]) were used. Isothermal titration calorimetry (ITC) experiments were performed to determine anesthetic affinity in solution conditions for apoferritin. Docking calculations were performed using DockingServer with the Lamarckian genetic algorithm and the Solis and Wets local search method (http://www.dockingserver.com/web). Twenty general anesthetics were docked into apoferritin. The predicted binding constants were compared with those obtained from ITC experiments for potential correlations. In the case of apoferritin, details of the binding site and their interactions were compared with recent cocrystallization data. Docking calculations for 6 general anesthetics currently used in clinical settings (isoflurane, sevoflurane, desflurane, halothane, propofol, and etomidate) with known 50% effective concentration (EC(50)) values were also performed in all tested proteins. The binding constants derived from docking experiments were compared with known EC(50) values and octanol/water partition coefficients for the 6 general anesthetics. All 20 general anesthetics docked unambiguously into the anesthetic binding site identified in the crystal structure of apoferritin. The binding constants for 20 anesthetics obtained from the docking calculations correlate significantly with those obtained from ITC experiments (P = 0.04). In the case of GLIC, the identified anesthetic binding sites in the crystal structure are among the docking predicted binding sites, but not the top ranked site. Docking calculations suggest a most probable binding site located in the extracellular domain of GLIC. The predicted affinities correlated significantly with the known EC(50) values for the 6 frequently used anesthetics in GLIC for the site identified in the experimental crystal data (P = 0.006). However, predicted affinities in apoferritin, human serum albumin, and cytochrome C did not correlate with these 6 anesthetics' known experimental EC(50) values. A weak correlation between the predicted affinities and the octanol/water partition coefficients was observed for the sites in GLIC. We demonstrated that anesthetic binding sites and relative affinities can be predicted using docking calculations in an automatic docking server (AutoDock) for both water-soluble and membrane proteins. Correlation of predicted affinity and EC(50) for 6 frequently used general anesthetics was only observed in GLIC, a member of a protein family relevant to anesthetic mechanism.
Binding Site and Affinity Prediction of General Anesthetics to Protein Targets Using Docking
Liu, Renyu; Perez-Aguilar, Jose Manuel; Liang, David; Saven, Jeffery G.
2012-01-01
Background The protein targets for general anesthetics remain unclear. A tool to predict anesthetic binding for potential binding targets is needed. In this study, we explore whether a computational method, AutoDock, could serve as such a tool. Methods High-resolution crystal data of water soluble proteins (cytochrome C, apoferritin and human serum albumin), and a membrane protein (a pentameric ligand-gated ion channel from Gloeobacter violaceus, GLIC) were used. Isothermal titration calorimetry (ITC) experiments were performed to determine anesthetic affinity in solution conditions for apoferritin. Docking calculations were performed using DockingServer with the Lamarckian genetic algorithm and the Solis and Wets local search method (https://www.dockingserver.com/web). Twenty general anesthetics were docked into apoferritin. The predicted binding constants are compared with those obtained from ITC experiments for potential correlations. In the case of apoferritin, details of the binding site and their interactions were compared with recent co-crystallization data. Docking calculations for six general anesthetics currently used in clinical settings (isoflurane, sevoflurane, desflurane, halothane, propofol, and etomidate) with known EC50 were also performed in all tested proteins. The binding constants derived from docking experiments were compared with known EC50s and octanol/water partition coefficients for the six general anesthetics. Results All 20 general anesthetics docked unambiguously into the anesthetic binding site identified in the crystal structure of apoferritin. The binding constants for 20 anesthetics obtained from the docking calculations correlate significantly with those obtained from ITC experiments (p=0.04). In the case of GLIC, the identified anesthetic binding sites in the crystal structure are among the docking predicted binding sites, but not the top ranked site. Docking calculations suggest a most probable binding site located in the extracellular domain of GLIC. The predicted affinities correlated significantly with the known EC50s for the six commonly used anesthetics in GLIC for the site identified in the experimental crystal data (p=0.006). However, predicted affinities in apoferritin, human serum albumin, and cytochrome C did not correlate with these six anesthetics’ known experimental EC50s. A weak correlation between the predicted affinities and the octanol/water partition coefficients was observed for the sites in GLIC. Conclusion We demonstrated that anesthetic binding sites and relative affinities can be predicted using docking calculations in an automatic docking server (Autodock) for both water soluble and membrane proteins. Correlation of predicted affinity and EC50 for six commonly used general anesthetics was only observed in GLIC, a member of a protein family relevant to anesthetic mechanism. PMID:22392968
EST-PAC a web package for EST annotation and protein sequence prediction
Strahm, Yvan; Powell, David; Lefèvre, Christophe
2006-01-01
With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782
Jones, David T; Singh, Tanya; Kosciolek, Tomasz; Tetchner, Stuart
2015-04-01
Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues. Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV. MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Boyd, Chelsea D.; Smith, T. Jarrod; El-Kirat-Chatel, Sofiane; Newell, Peter D.; Dufrêne, Yves F.
2014-01-01
The localization of the LapA protein to the cell surface is a key step required by Pseudomonas fluorescens Pf0-1 to irreversibly attach to a surface and form a biofilm. LapA is a member of a diverse family of predicted bacterial adhesins, and although lacking a high degree of sequence similarity, family members do share common predicted domains. Here, using mutational analysis, we determine the significance of each domain feature of LapA in relation to its export and localization to the cell surface and function in biofilm formation. Our previous work showed that the N terminus of LapA is required for cleavage by the periplasmic cysteine protease LapG and release of the adhesin from the cell surface under conditions unfavorable for biofilm formation. We define an additional critical region of the N terminus of LapA required for LapG proteolysis. Furthermore, our results suggest that the domains within the C terminus of LapA are not absolutely required for biofilm formation, export, or localization to the cell surface, with the exception of the type I secretion signal, which is required for LapA export and cell surface localization. In contrast, deletion of the central repetitive region of LapA, consisting of 37 repeats of 100 amino acids, results in an inability to form a biofilm. We also used single-molecule atomic force microscopy to further characterize the role of these domains in biofilm formation on hydrophobic and hydrophilic surfaces. These studies represent the first detailed analysis of the domains of the LapA family of biofilm adhesin proteins. PMID:24837291
Loo, Lit-Hsin; Laksameethanasan, Danai; Tung, Yi-Ling
2014-03-01
Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast Saccharomyces cerevisiae as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at http://plast.bii.a-star.edu.sg.
Loo, Lit-Hsin; Laksameethanasan, Danai; Tung, Yi-Ling
2014-01-01
Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast Saccharomyces cerevisiae as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at http://plast.bii.a-star.edu.sg. PMID:24603469
Transcriptomic analysis of the autophagy machinery in crustaceans.
Suwansa-Ard, Saowaros; Kankuan, Wilairat; Thongbuakaew, Tipsuda; Saetan, Jirawat; Kornthong, Napamanee; Kruangkum, Thanapong; Khornchatri, Kanjana; Cummins, Scott F; Isidoro, Ciro; Sobhon, Prasert
2016-08-09
The giant freshwater prawn, Macrobrachium rosenbergii, is a decapod crustacean that is commercially important as a food source. Farming of commercial crustaceans requires an efficient management strategy because the animals are easily subjected to stress and diseases during the culture. Autophagy, a stress response process, is well-documented and conserved in most animals, yet it is poorly studied in crustaceans. In this study, we have performed an in silico search for transcripts encoding autophagy-related (Atg) proteins within various tissue transcriptomes of M. rosenbergii. Basic Local Alignment Search Tool (BLAST) search using previously known Atg proteins as queries revealed 41 transcripts encoding homologous M. rosenbergii Atg proteins. Among these Atg proteins, we selected commonly used autophagy markers, including Beclin 1, vacuolar protein sorting (Vps) 34, microtubule-associated proteins 1A/1B light chain 3B (MAP1LC3B), p62/sequestosome 1 (SQSTM1), and lysosomal-associated membrane protein 1 (Lamp-1) for further sequence analyses using comparative alignment and protein structural prediction. We found that crustacean autophagy marker proteins contain conserved motifs typical of other animal Atg proteins. Western blotting using commercial antibodies raised against human Atg marker proteins indicated their presence in various M. rosenbergii tissues, while immunohistochemistry localized Atg marker proteins within ovarian tissue, specifically late stage oocytes. This study demonstrates that the molecular components of autophagic process are conserved in crustaceans, which is comparable to autophagic process in mammals. Furthermore, it provides a foundation for further studies of autophagy in crustaceans that may lead to more understanding of the reproduction- and stress-related autophagy, which will enable the efficient aquaculture practices.
Yam, Xue Yan; Birago, Cecilia; Fratini, Federica; Di Girolamo, Francesco; Raggi, Carla; Sargiacomo, Massimo; Bachi, Angela; Berry, Laurence; Fall, Gamou; Currà, Chiara; Pizzi, Elisabetta; Breton, Catherine Braun; Ponzi, Marta
2013-01-01
Intracellular pathogens contribute to a significant proportion of infectious diseases worldwide. The successful strategy of evading the immune system by hiding inside host cells is common to all the microorganism classes, which exploit membrane microdomains, enriched in cholesterol and sphingolipids, to invade and colonize the host cell. These assemblies, with distinct biochemical properties, can be isolated by means of flotation in sucrose density gradient centrifugation because they are insoluble in nonionic detergents at low temperature. We analyzed the protein and lipid contents of detergent-resistant membranes from erythrocytes infected by Plasmodium falciparum, the most deadly human malaria parasite. Proteins associated with membrane microdomains of trophic parasite blood stages (trophozoites) include an abundance of chaperones, molecules involved in vesicular trafficking, and enzymes implicated in host hemoglobin degradation. About 60% of the identified proteins contain a predicted localization signal suggesting a role of membrane microdomains in protein sorting/trafficking. To validate our proteomic data, we raised antibodies against six Plasmodium proteins not characterized previously. All the selected candidates were recovered in floating low-density fractions after density gradient centrifugation. The analyzed proteins localized either to internal organelles, such as the mitochondrion and the endoplasmic reticulum, or to exported membrane structures, the parasitophorous vacuole membrane and Maurer's clefts, implicated in targeting parasite proteins to the host erythrocyte cytosol or surface. The relative abundance of cholesterol and phospholipid species varies in gradient fractions containing detergent-resistant membranes, suggesting heterogeneity in the lipid composition of the isolated microdomain population. This study is the first report showing the presence of cholesterol-rich microdomains with distinct properties and subcellular localization in trophic stages of Plasmodium falciparum. PMID:24045696
NASA Astrophysics Data System (ADS)
Keskin, Ozlem; Ma, Buyong; Rogale, Kristina; Gunasekaran, K.; Nussinov, Ruth
2005-06-01
Understanding and ultimately predicting protein associations is immensely important for functional genomics and drug design. Here, we propose that binding sites have preferred organizations. First, the hot spots cluster within densely packed 'hot regions'. Within these regions, they form networks of interactions. Thus, hot spots located within a hot region contribute cooperatively to the stability of the complex. However, the contributions of separate, independent hot regions are additive. Moreover, hot spots are often already pre-organized in the unbound (free) protein states. Describing a binding site through independent local hot regions has implications for binding site definition, design and parametrization for prediction. The compactness and cooperativity emphasize the similarity between binding and folding. This proposition is grounded in computation and experiment. It explains why summation of the interactions may over-estimate the stability of the complex. Furthermore, statistically, charge-charge coupling of the hot spots is disfavored. However, since within the highly packed regions the solvent is screened, the electrostatic contributions are strengthened. Thus, we propose a new description of protein binding sites: a site consists of (one or a few) self-contained cooperative regions. Since the residue hot spots are those conserved by evolution, proteins binding multiple partners at the same sites are expected to use all or some combination of these regions.
Effective Design of Multifunctional Peptides by Combining Compatible Functions
Diener, Christian; Garza Ramos Martínez, Georgina; Moreno Blas, Daniel; Castillo González, David A.; Corzo, Gerardo; Castro-Obregon, Susana; Del Rio, Gabriel
2016-01-01
Multifunctionality is a common trait of many natural proteins and peptides, yet the rules to generate such multifunctionality remain unclear. We propose that the rules defining some protein/peptide functions are compatible. To explore this hypothesis, we trained a computational method to predict cell-penetrating peptides at the sequence level and learned that antimicrobial peptides and DNA-binding proteins are compatible with the rules of our predictor. Based on this finding, we expected that designing peptides for CPP activity may render AMP and DNA-binding activities. To test this prediction, we designed peptides that embedded two independent functional domains (nuclear localization and yeast pheromone activity), linked by optimizing their composition to fit the rules characterizing cell-penetrating peptides. These peptides presented effective cell penetration, DNA-binding, pheromone and antimicrobial activities, thus confirming the effectiveness of our computational approach to design multifunctional peptides with potential therapeutic uses. Our computational implementation is available at http://bis.ifc.unam.mx/en/software/dcf. PMID:27096600
Proteomic analysis of Toxocara canis excretory and secretory (TES) proteins.
Sperotto, Rita Leal; Kremer, Frederico Schmitt; Aires Berne, Maria Elisabeth; Costa de Avila, Luciana F; da Silva Pinto, Luciano; Monteiro, Karina Mariante; Caumo, Karin Silva; Ferreira, Henrique Bunselmeyer; Berne, Natália; Borsuk, Sibele
2017-01-01
Toxocariasis is a neglected disease, and its main etiological agent is the nematode Toxocara canis. Serological diagnosis is performed by an enzyme-linked immunosorbent assay using T. canis excretory and secretory (TES) antigens produced by in vitro cultivation of larvae. Identification of TES proteins can be useful for the development of new diagnostic strategies since few TES components have been described so far. Herein, we report the results obtained by proteomic analysis of TES proteins using a liquid chromatography-tandem mass spectrometry (LC-MS/MS) approach. TES fractions were separated by one-dimensional SDS-PAGE and analyzed by LC-MS/MS. The MS/MS spectra were compared with a database of protein sequences deduced from the genome sequence of T. canis, and a total of 19 proteins were identified. Classification according to the signal peptide prediction using the SignalP server showed that seven of the identified proteins were extracellular, 10 had cytoplasmic or nuclear localization, while the subcellular localization of two proteins was unknown. Analysis of molecular functions by BLAST2GO showed that the majority of the gene ontology (GO) terms associated with the proteins present in the TES sample were associated with binding functions, including but not limited to protein binding (GO:0005515), inorganic ion binding (GO:0043167), and organic cyclic compound binding (GO:0097159). This study provides additional information about the exoproteome of T. canis, which can lead to the development of new strategies for diagnostics or vaccination. Copyright © 2016 Elsevier B.V. All rights reserved.
Denis-Larose, Claude; Bergeron, Hélène; Labbé, Diane; Greer, Charles W.; Hawari, Jalal; Grossman, Matthew J.; Sankey, Bruce M.; Lau, Peter C. K.
1998-01-01
The replication region of a 100-kb desulfurization plasmid (pSOX) from Rhodococcus sp. strain X309 was localized to a 4-kb KpnI fragment, and its sequence was determined. The amino acid sequence of one of the predicted open reading frames (ORFs) was related to the putative replication (Rep) protein sequences of the mycobacterial pLR7 family of plasmids. Three of the five predicted ORF products were identified by radiolabelling with the Escherichia coli T7 polymerase/promoter system. In E. coli, the Rep protein of pSOX was apparently synthesized in a shortened form, 21.3 kDa instead of the predicted 41.3 kDa, as a result of an internal initiation. This situation is reminescent of that for some bacterial Rep proteins. A shuttle plasmid was constructed with the pSOX origin, pBluescript II KS−, and the chloramphenicol resistance (Cmr) gene from pRF29. This new shuttle plasmid was used to demonstrate expression of the Bacillus subtilis sacB gene in a strain of Rhodococcus, rendering it sensitive to the presence of sucrose. PMID:9797291
X-ray diffraction from nonuniformly stretched helical molecules
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prodanovic, Momcilo; Irving, Thomas C.; Mijailovich, Srboljub M.
2016-04-18
The fibrous proteins in living cells are exposed to mechanical forces interacting with other subcellular structures. X-ray fiber diffraction is often used to assess deformation and movement of these proteins, but the analysis has been limited to the theory for fibrous molecular systems that exhibit helical symmetry. However, this approach cannot adequately interpret X-ray data from fibrous protein assemblies where the local strain varies along the fiber length owing to interactions of its molecular constituents with their binding partners. To resolve this problem a theoretical formulism has been developed for predicting the diffraction from individual helical molecular structures nonuniformly strainedmore » along their lengths. This represents a critical first step towards modeling complex dynamical systems consisting of multiple helical structures using spatially explicit, multi-scale Monte Carlo simulations where predictions are compared with experimental data in a `forward' process to iteratively generate ever more realistic models. Here the effects of nonuniform strains and the helix length on the resulting magnitude and phase of diffraction patterns are quantitatively assessed. Examples of the predicted diffraction patterns of nonuniformly deformed double-stranded DNA and actin filaments in contracting muscle are presented to demonstrate the feasibly of this theoretical approach.« less
Simoncini, David; Schiex, Thomas; Zhang, Kam Y J
2017-05-01
Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRose c ) and an energy-based one (EdaRose en ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRose c is able to provide predictions with lower C αRMSD to the native structure than EdaRose en and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Najmanovich, Rafael
2013-01-01
IsoCleft Finder is a web-based tool for the detection of local geometric and chemical similarities between potential small-molecule binding cavities and a non-redundant dataset of ligand-bound known small-molecule binding-sites. The non-redundant dataset developed as part of this study is composed of 7339 entries representing unique Pfam/PDB-ligand (hetero group code) combinations with known levels of cognate ligand similarity. The query cavity can be uploaded by the user or detected automatically by the system using existing PDB entries as well as user-provided structures in PDB format. In all cases, the user can refine the definition of the cavity interactively via a browser-based Jmol 3D molecular visualization interface. Furthermore, users can restrict the search to a subset of the dataset using a cognate-similarity threshold. Local structural similarities are detected using the IsoCleft software and ranked according to two criteria (number of atoms in common and Tanimoto score of local structural similarity) and the associated Z-score and p-value measures of statistical significance. The results, including predicted ligands, target proteins, similarity scores, number of atoms in common, etc., are shown in a powerful interactive graphical interface. This interface permits the visualization of target ligands superimposed on the query cavity and additionally provides a table of pairwise ligand topological similarities. Similarities between top scoring ligands serve as an additional tool to judge the quality of the results obtained. We present several examples where IsoCleft Finder provides useful functional information. IsoCleft Finder results are complementary to existing approaches for the prediction of protein function from structure, rational drug design and x-ray crystallography. IsoCleft Finder can be found at: http://bcb.med.usherbrooke.ca/isocleftfinder. PMID:24555058
Kasson, Peter M.; Hess, Berk; Lindahl, Erik
2013-01-01
Cellular lipid membranes are spatially inhomogeneous soft materials. Materials properties such as pressure and surface tension thus show important microscopic-scale variation that is critical to many biological functions. We present a means to calculate pressure and surface tension in a 3D-resolved manner within molecular-dynamics simulations and show how such measurements can yield important insight. We also present the first corrections to local virial and pressure fields to account for the constraints typically used in lipid simulations that otherwise cause problems in highly oriented systems such as bilayers. Based on simulations of an asymmetric bacterial ion channel in a POPC bilayer, we demonstrate how 3D-resolved pressure can probe for both short-range and long-range effects from the protein on the membrane environment. We also show how surface tension is a sensitive metric for inter-leaflet equilibrium and can be used to detect even subtle imbalances between bilayer leaflets in a membrane-protein simulation. Since surface tension is known to modulate the function of many proteins, this effect is an important consideration for predictions of ion channel function. We outline a strategy by which our local pressure measurements, which we make available within a version of the GROMACS simulation package, may be used to design optimally equilibrated membrane-protein simulations. PMID:23318532
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
QuickVina: accelerating AutoDock Vina using gradient-based heuristics for global optimization.
Handoko, Stephanus Daniel; Ouyang, Xuchang; Su, Chinh Tran To; Kwoh, Chee Keong; Ong, Yew Soon
2012-01-01
Predicting binding between macromolecule and small molecule is a crucial phase in the field of rational drug design. AutoDock Vina, one of the most widely used docking software released in 2009, uses an empirical scoring function to evaluate the binding affinity between the molecules and employs the iterated local search global optimizer for global optimization, achieving a significantly improved speed and better accuracy of the binding mode prediction compared its predecessor, AutoDock 4. In this paper, we propose further improvement in the local search algorithm of Vina by heuristically preventing some intermediate points from undergoing local search. Our improved version of Vina-dubbed QVina-achieved a maximum acceleration of about 25 times with the average speed-up of 8.34 times compared to the original Vina when tested on a set of 231 protein-ligand complexes while maintaining the optimal scores mostly identical. Using our heuristics, larger number of different ligands can be quickly screened against a given receptor within the same time frame.
MetaMQAP: a meta-server for the quality assessment of protein models.
Pawlowski, Marcin; Gajda, Michal J; Matlak, Ryszard; Bujnicki, Janusz M
2008-09-29
Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. The utility of models depends on the degree of these deviations. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. However, only a few methods predict correctness of different parts of computational models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors. The ability to identify local errors in models was tested for eight MQAPs: VERIFY3D, PROSA, BALA, ANOLEA, PROVE, TUNE, REFINER, PROQRES on 8251 models from the CASP-5 and CASP-6 experiments, by calculating the Spearman's rank correlation coefficients between per-residue scores of these methods and local deviations between C-alpha atoms in the models vs. experimental structures. As a reference, we calculated the value of correlation between the local deviations and trivial features that can be calculated for each residue directly from the models, i.e. solvent accessibility, depth in the structure, and the number of local and non-local neighbours. We found that absolute correlations of scores returned by the MQAPs and local deviations were poor for all methods. In addition, scores of PROQRES and several other MQAPs strongly correlate with 'trivial' features. Therefore, we developed MetaMQAP, a meta-predictor based on a multivariate regression model, which uses scores of the above-mentioned methods, but in which trivial parameters are controlled. MetaMQAP predicts the absolute deviation (in Angströms) of individual C-alpha atoms between the model and the unknown true structure as well as global deviations (expressed as root mean square deviation and GDT_TS scores). Local model accuracy predicted by MetaMQAP shows an impressive correlation coefficient of 0.7 with true deviations from native structures, a significant improvement over all constituent primary MQAP scores. The global MetaMQAP score is correlated with model GDT_TS on the level of 0.89. Finally, we compared our method with the MQAPs that scored best in the 7th edition of CASP, using CASP7 server models (not included in the MetaMQAP training set) as the test data. In our benchmark, MetaMQAP is outperformed only by PCONS6 and method QA_556 - methods that require comparison of multiple alternative models and score each of them depending on its similarity to other models. MetaMQAP is however the best among methods capable of evaluating just single models. We implemented the MetaMQAP as a web server available for free use by all academic users at the URL https://genesilico.pl/toolkit/
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Wierk, Jannika Katharina; Langbehn, Annette; Kamper, Maria; Richter, Stefanie; Burda, Paul-Christian; Heussler, Volker Theo; Deschermeier, Christina
2013-01-01
Mitogen-activated protein kinases (MAPKs) regulate key signaling events in eukaryotic cells. In the genomes of protozoan Plasmodium parasites, the causative agents of malaria, two genes encoding kinases with significant homology to other eukaryotic MAPKs have been identified (mapk1, mapk2). In this work, we show that both genes are transcribed during Plasmodium berghei liver stage development, and analyze expression and subcellular localization of the PbMAPK1 protein in liver stage parasites. Live cell imaging of transgenic parasites expressing GFP-tagged PbMAPK1 revealed a nuclear localization of PbMAPK1 in the early schizont stage mediated by nuclear localization signals in the C-terminal domain. In contrast, a distinct localization of PbMAPK1 in comma/ring-shaped structures in proximity to the parasite’s nuclei and the invaginating parasite membrane was observed during the cytomere stage of parasite development as well as in immature blood stage schizonts. The PbMAPK1 localization was found to be independent of integrity of a motif putatively involved in ATP binding, integrity of the putative activation motif and the presence of a predicted coiled-coil domain in the C-terminal domain. Although PbMAPK1 knock out parasites showed normal liver stage development, the kinase may still fulfill a dual function in both schizogony and merogony of liver stage parasites regulated by its dynamic and stage-dependent subcellular localization. PMID:23544094
Khan, Shujaat; Naseem, Imran; Togneri, Roberto; Bennamoun, Mohammed
2018-01-01
In extreme cold weather, living organisms produce Antifreeze Proteins (AFPs) to counter the otherwise lethal intracellular formation of ice. Structures and sequences of various AFPs exhibit a high degree of heterogeneity, consequently the prediction of the AFPs is considered to be a challenging task. In this research, we propose to handle this arduous manifold learning task using the notion of localized processing. In particular, an AFP sequence is segmented into two sub-segments each of which is analyzed for amino acid and di-peptide compositions. We propose to use only the most significant features using the concept of information gain (IG) followed by a random forest classification approach. The proposed RAFP-Pred achieved an excellent performance on a number of standard datasets. We report a high Youden's index (sensitivity+specificity-1) value of 0.75 on the standard independent test data set outperforming the AFP-PseAAC, AFP_PSSM, AFP-Pred, and iAFP by a margin of 0.05, 0.06, 0.14, and 0.68, respectively. The verification rate on the UniProKB dataset is found to be 83.19 percent which is substantially superior to the 57.18 percent reported for the iAFP method.
Zhu, Lin; Guo, Wei-Li; Lu, Canyi; Huang, De-Shuang
2016-12-01
Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at https://github.com/ekffar/LSUE.
When a domain isn’t a domain, and why it’s important to properly filter proteins in databases
Towse, Clare-Louise; Daggett, Valerie
2013-01-01
Summary Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our Consensus Domain Dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predicted estimates suggest 40% of proteins to have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets. PMID:23108912
Protein targeting and integration signal for the chloroplastic outer envelope membrane.
Li, H M; Chen, L J
1996-01-01
Most proteins in chloroplasts are encoded by the nuclear genome and synthesized in the cytosol. With the exception of most quter envelope membrane proteins, nuclear-encoded chloroplastic proteins are synthesized with N-terminal extensions that contain the chloroplast targeting information of these proteins. Most outer membrane proteins, however, are synthesized without extensions in the cytosol. Therefore, it is not clear where the chloroplastic outer membrane targeting information resides within these polypeptides. We have analyzed a chloroplastic outer membrane protein, OEP14 (outer envelope membrane protein of 14 kD, previously named OM14), and localized its outer membrane targeting and integration signal to the first 30 amino acids of the protein. This signal consists of a positively charged N-terminal portion followed by a hydrophobic core, bearing resemblance to the signal peptides of proteins targeted to the endoplasmic reticulum. However, a chimeric protein containing this signal fused to a passenger protein did not integrate into the endoplasmic reticulum membrane. Furthermore, membrane topology analysis indicated that the signal inserts into the chloroplastic outer membrane in an orientation opposite to that predicted by the "positive inside" rule. PMID:8953775
Proteins of the Glycine Decarboxylase Complex in the Hydrogenosome of Trichomonas vaginalis†
Mukherjee, Mandira; Brown, Mark T.; McArthur, Andrew G.; Johnson, Patricia J.
2006-01-01
Trichomonas vaginalis is a unicellular eukaryote that lacks mitochondria and contains a specialized organelle, the hydrogenosome, involved in carbohydrate metabolism and iron-sulfur cluster assembly. We report the identification of two glycine cleavage H proteins and a dihydrolipoamide dehydrogenase (L protein) of the glycine decarboxylase complex in T. vaginalis with predicted N-terminal hydrogenosomal presequences. Immunofluorescence analyses reveal that both H and L proteins are localized in hydrogenosomes, providing the first evidence for amino acid metabolism in this organelle. All three proteins were expressed in Escherichia coli and purified to homogeneity. The experimental Km of L protein for the two H proteins were 2.6 μM and 3.7 μM, consistent with both H proteins serving as substrates of L protein. Analyses using purified hydrogenosomes showed that endogenous H proteins exist as monomers and endogenous L protein as a homodimer in their native states. Phylogenetic analyses of L proteins revealed that the T. vaginalis homologue shares a common ancestry with dihydrolipoamide dehydrogenases from the firmicute bacteria, indicating its acquisition via a horizontal gene transfer event independent of the origins of mitochondria and hydrogenosomes. PMID:17158739
Musite, a tool for global prediction of general and kinase-specific phosphorylation sites.
Gao, Jianjiong; Thelen, Jay J; Dunker, A Keith; Xu, Dong
2010-12-01
Reversible protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. High throughput experimental studies using mass spectrometry have identified many phosphorylation sites, primarily from eukaryotes. However, the vast majority of phosphorylation sites remain undiscovered, even in well studied systems. Because mass spectrometry-based experimental approaches for identifying phosphorylation events are costly, time-consuming, and biased toward abundant proteins and proteotypic peptides, in silico prediction of phosphorylation sites is potentially a useful alternative strategy for whole proteome annotation. Because of various limitations, current phosphorylation site prediction tools were not well designed for comprehensive assessment of proteomes. Here, we present a novel software tool, Musite, specifically designed for large scale predictions of both general and kinase-specific phosphorylation sites. We collected phosphoproteomics data in multiple organisms from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates local sequence similarities to known phosphorylation sites, protein disorder scores, and amino acid frequencies. Application of Musite on several proteomes yielded tens of thousands of phosphorylation site predictions at a high stringency level. Cross-validation tests show that Musite achieves some improvement over existing tools in predicting general phosphorylation sites, and it is at least comparable with those for predicting kinase-specific phosphorylation sites. In Musite V1.0, we have trained general prediction models for six organisms and kinase-specific prediction models for 13 kinases or kinase families. Although the current pretrained models were not correlated with any particular cellular conditions, Musite provides a unique functionality for training customized prediction models (including condition-specific models) from users' own data. In addition, with its easily extensible open source application programming interface, Musite is aimed at being an open platform for community-based development of machine learning-based phosphorylation site prediction applications. Musite is available at http://musite.sourceforge.net/.
3D Protein structure prediction with genetic tabu search algorithm
2010-01-01
Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Friso, Giulia; Giacomelli, Lisa; Ytterberg, A Jimmy; Peltier, Jean-Benoit; Rudella, Andrea; Sun, Qi; Wijk, Klaas J van
2004-02-01
An extensive analysis of the Arabidopsis thaliana peripheral and integral thylakoid membrane proteome was performed by sequential extractions with salt, detergent, and organic solvents, followed by multidimensional protein separation steps (reverse-phase HPLC and one- and two-dimensional electrophoresis gels), different enzymatic and nonenzymatic protein cleavage techniques, mass spectrometry, and bioinformatics. Altogether, 154 proteins were identified, of which 76 (49%) were alpha-helical integral membrane proteins. Twenty-seven new proteins without known function but with predicted chloroplast transit peptides were identified, of which 17 (63%) are integral membrane proteins. These new proteins, likely important in thylakoid biogenesis, include two rubredoxins, a potential metallochaperone, and a new DnaJ-like protein. The data were integrated with our analysis of the lumenal-enriched proteome. We identified 83 out of 100 known proteins of the thylakoid localized photosynthetic apparatus, including several new paralogues and some 20 proteins involved in protein insertion, assembly, folding, or proteolysis. An additional 16 proteins are involved in translation, demonstrating that the thylakoid membrane surface is an important site for protein synthesis. The high coverage of the photosynthetic apparatus and the identification of known hydrophobic proteins with low expression levels, such as cpSecE, Ohp1, and Ohp2, indicate an excellent dynamic resolution of the analysis. The sequential extraction process proved very helpful to validate transmembrane prediction. Our data also were cross-correlated to chloroplast subproteome analyses by other laboratories. All data are deposited in a new curated plastid proteome database (PPDB) with multiple search functions (http://cbsusrv01.tc.cornell.edu/users/ppdb/). This PPDB will serve as an expandable resource for the plant community.
Analysis of the gravitaxis signal transduction chain in Euglena gracilis
NASA Astrophysics Data System (ADS)
Nasir, Adeel
Abstract Euglena gracilis is a photosynthetic, eukaryotic flagellate. It can adapt autotrophic and heterotrophic mode of growth and respond to different stimuli, this makes it an organism of choice for different research disciplines. It swims to reach a suitable niche by employing different stimuli such as oxygen, light, gravity and different chemicals. Among these stimuli light and gravity are the most important. Phototaxis (locomotion under light stimulus) and gravitaxis (locomotion under gravity stimulus) synergistically help cells to attain an optimal niche in the environment. However, in the complete absence of light or under scarcity of detectable light, cells can totally depend on gravity to find its swimming path. Therefore gravity has certain advantages over other stimuli.Unlike phototatic signal transduction chain of Euglena gracilis no clear primary gravity receptor has been identified in Euglena cells so far. However, there are some convincing evidence that TRP like channels act as a primary gravity receptor in Euglena gracilis.Use of different inhibitors gave rise to the involvement of protein kinase and calmodulin proteins in signal transduction chain of Euglena gracilis. Recently, specific calmodulin (Calmodulin 2) and protein kinase (PKA) have been identified as potential candidates of gravitactic signal transduction chain. Further characterization and investigation of these candidates was required. Therefore a combination of biochemical and genetic techniques was employed to localize proteins in cells and also to find interacting partners. For localization studies, specific antibodies were raised and characterized. Specificity of antibodies was validated by knockdown mutants, Invitro-translated proteins and heterologously expressed proteins. Cell fractionation studies, involving separation of the cell body and flagella for western blot analysis and confocal immunofluorescence studies were performed for subcellular localization. In order to find interacting partners, yeast two hybrid screen was conducted by using commercially synthesized cDNA library for Euglena gracilis. For both protein kinase and calmodulin some putative interacting partners were found. These plausible candidates are subjected for further validation studies, to verify the protein-protein interaction. In addition, some differential expression studies are also performed for these proteins to evaluate their expression levels under conditions which are known to affect gravitaxis in Euglena gracilis. Taken together, these data are in good agreement with some of already predicted studies for protein localization, but at the same time provides new insights for further studies.
Boareto, Marcelo; Yamagishi, Michel E B; Caticha, Nestor; Leite, Vitor B P
2012-10-01
In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. Copyright © 2012 Elsevier Ltd. All rights reserved.
Ferrada, Evandro; Vergara, Ismael A; Melo, Francisco
2007-01-01
The correct discrimination between native and near-native protein conformations is essential for achieving accurate computer-based protein structure prediction. However, this has proven to be a difficult task, since currently available physical energy functions, empirical potentials and statistical scoring functions are still limited in achieving this goal consistently. In this work, we assess and compare the ability of different full atom knowledge-based potentials to discriminate between native protein structures and near-native protein conformations generated by comparative modeling. Using a benchmark of 152 near-native protein models and their corresponding native structures that encompass several different folds, we demonstrate that the incorporation of close non-bonded pairwise atom terms improves the discriminating power of the empirical potentials. Since the direct and unbiased derivation of close non-bonded terms from current experimental data is not possible, we obtained and used those terms from the corresponding pseudo-energy functions of a non-local knowledge-based potential. It is shown that this methodology significantly improves the discrimination between native and near-native protein conformations, suggesting that a proper description of close non-bonded terms is important to achieve a more complete and accurate description of native protein conformations. Some external knowledge-based energy functions that are widely used in model assessment performed poorly, indicating that the benchmark of models and the specific discrimination task tested in this work constitutes a difficult challenge.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, Enzhi; Lei, Yan; Liu, Qian
2009-04-15
A novel protein that associates with interphase nucleus and mitotic apparatus (INMAP) was identified by screening HeLa cDNA expression library with an autoimmune serum followed by tandem mass spectrometry. Its complete cDNA sequence of 1.818 kb encodes 343 amino acids with predicted molecular mass of 38.2 kDa and numerous phosphorylation sites. The sequence is identical with nucleotides 1-1800 bp of an unnamed gene (GenBank accession no. (7022388)) and highly homologous with the 3'-terminal sequence of POLR3B. A monoclonal antibody against INMAP reacted with similar proteins in S. cerevisiae, Mel and HeLa cells, suggesting that it is a conserved protein. Confocalmore » microscopy using either GFP-INMAP fusion protein or labeling with the monoclonal antibody revealed that the protein localizes as distinct dots in the interphase nucleus, but during mitosis associates closely with the spindle. Double immunolabeling using specific antibodies showed that the INMAP co-localizes with {alpha}-tubulin, {gamma}-tubulin, and NuMA. INMAP also co-immunoprecipitated with these proteins in their native state. Stable overexpression of INMAP in HeLa cell lines leads to defects in the spindle, mitotic arrest, formation of polycentrosomal and multinuclear cells, inhibition of growth, and apoptosis. We propose that INMAP is a novel protein that plays essential role in spindle formation and cell-cycle progression.« less
Licht, J D; Hanna-Rose, W; Reddy, J C; English, M A; Ro, M; Grossel, M; Shaknovich, R; Hansen, U
1994-01-01
We previously demonstrated that the Drosophila Krüppel protein is a transcriptional repressor with separable DNA-binding and transcriptional repression activities. In this study, the minimal amino (N)-terminal repression region of the Krüppel protein was defined by transferring regions of the Krüppel protein to a heterologous DNA-binding protein, the lacI protein. Fusion of a predicted alpha-helical region from amino acids 62 to 92 in the N terminus of the Krüppel protein was sufficient to transfer repression activity. This putative alpha-helix has several hydrophobic surfaces, as well as a glutamine-rich surface. Mutants containing multiple amino acid substitutions of the glutamine residues demonstrated that this putative alpha-helical region is essential for repression activity of a Krüppel protein containing the entire N-terminal and DNA-binding regions. Furthermore, one point mutant with only a single glutamine on this surface altered to lysine abolished the ability of the Krüppel protein to repress, indicating the importance of the amino acid at residue 86 for repression. The N terminus also contained an adjacent activation region localized between amino acids 86 and 117. Finally, in accordance with predictions from primary amino acid sequence similarity, a repression region from the Drosophila even-skipped protein, which was six times more potent than that of the Krüppel protein in the mammalian cells, was characterized. This segment included a hydrophobic stretch of 11 consecutive alanine residues and a proline-rich region. Images PMID:8196644
Ritchie, Andrew W; Webb, Lauren J
2015-11-05
Biological function emerges in large part from the interactions of biomacromolecules in the complex and dynamic environment of the living cell. For this reason, macromolecular interactions in biological systems are now a major focus of interest throughout the biochemical and biophysical communities. The affinity and specificity of macromolecular interactions are the result of both structural and electrostatic factors. Significant advances have been made in characterizing structural features of stable protein-protein interfaces through the techniques of modern structural biology, but much less is understood about how electrostatic factors promote and stabilize specific functional macromolecular interactions over all possible choices presented to a given molecule in a crowded environment. In this Feature Article, we describe how vibrational Stark effect (VSE) spectroscopy is being applied to measure electrostatic fields at protein-protein interfaces, focusing on measurements of guanosine triphosphate (GTP)-binding proteins of the Ras superfamily binding with structurally related but functionally distinct downstream effector proteins. In VSE spectroscopy, spectral shifts of a probe oscillator's energy are related directly to that probe's local electrostatic environment. By performing this experiment repeatedly throughout a protein-protein interface, an experimental map of measured electrostatic fields generated at that interface is determined. These data can be used to rationalize selective binding of similarly structured proteins in both in vitro and in vivo environments. Furthermore, these data can be used to compare to computational predictions of electrostatic fields to explore the level of simulation detail that is necessary to accurately predict our experimental findings.
Prediction of the in planta Phakopsora pachyrhizi secretome and potential effector families.
de Carvalho, Mayra C da C G; Costa Nascimento, Leandro; Darben, Luana M; Polizel-Podanosqui, Adriana M; Lopes-Caitar, Valéria S; Qi, Mingsheng; Rocha, Carolina S; Carazzolle, Marcelo Falsarella; Kuwahara, Márcia K; Pereira, Goncalo A G; Abdelnoor, Ricardo V; Whitham, Steven A; Marcelino-Guimarães, Francismar C
2017-04-01
Asian soybean rust (ASR), caused by the obligate biotrophic fungus Phakopsora pachyrhizi, can cause losses greater than 80%. Despite its economic importance, there is no soybean cultivar with durable ASR resistance. In addition, the P. pachyrhizi genome is not yet available. However, the availability of other rust genomes, as well as the development of sample enrichment strategies and bioinformatics tools, has improved our knowledge of the ASR secretome and its potential effectors. In this context, we used a combination of laser capture microdissection (LCM), RNAseq and a bioinformatics pipeline to identify a total of 36 350 P. pachyrhizi contigs expressed in planta and a predicted secretome of 851 proteins. Some of the predicted secreted proteins had characteristics of candidate effectors: small size, cysteine rich, do not contain PFAM domains (except those associated with pathogenicity) and strongly expressed in planta. A comparative analysis of the predicted secreted proteins present in Pucciniales species identified new members of soybean rust and new Pucciniales- or P. pachyrhizi-specific families (tribes). Members of some families were strongly up-regulated during early infection, starting with initial infection through haustorium formation. Effector candidates selected from two of these families were able to suppress immunity in transient assays, and were localized in the plant cytoplasm and nuclei. These experiments support our bioinformatics predictions and show that these families contain members that have functions consistent with P. pachyrhizi effectors. © 2016 BSPP AND JOHN WILEY & SONS LTD.
Siamer, Sabrina; Gaubert, Stéphane; Boureau, Tristan; Brisset, Marie-Noëlle; Barny, Marie-Anne
2013-05-01
The bacterium Erwinia amylovora causes fire blight, an invasive disease that threatens apple trees, pear trees and other plants of the Rosaceae family. Erwinia amylovora pathogenicity relies on a type III secretion system and on a single effector DspA/E. This effector belongs to the widespread AvrE family of effectors whose biological function is unknown. In this manuscript, we performed a bioinformatic analysis of DspA/E- and AvrE-related effectors. Motif search identified nuclear localization signals, peroxisome targeting signals, endoplasmic reticulum membrane retention signals and leucine zipper motifs, but none of these motifs were present in all the AvrE-related effectors analysed. Protein threading analysis, however, predicted a conserved double β-propeller domain in the N-terminal part of all the analysed effector sequences. We then performed a random pentapeptide mutagenesis of DspA/E, which led to the characterization of 13 new altered proteins with a five amino acids insertion. Eight harboured the insertion inside the predicted β-propeller domain and six of these eight insertions impaired DspA/E stability or function. Conversely, the two remaining insertions generated proteins that were functional and abundantly secreted in the supernatant suggesting that these two insertions stabilized the protein. © 2013 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Krepl, Miroslav; Cléry, Antoine; Blatter, Markus; Allain, Frederic H.T.; Sponer, Jiri
2016-01-01
RNA recognition motif (RRM) proteins represent an abundant class of proteins playing key roles in RNA biology. We present a joint atomistic molecular dynamics (MD) and experimental study of two RRM-containing proteins bound with their single-stranded target RNAs, namely the Fox-1 and SRSF1 complexes. The simulations are used in conjunction with NMR spectroscopy to interpret and expand the available structural data. We accumulate more than 50 μs of simulations and show that the MD method is robust enough to reliably describe the structural dynamics of the RRM–RNA complexes. The simulations predict unanticipated specific participation of Arg142 at the protein–RNA interface of the SRFS1 complex, which is subsequently confirmed by NMR and ITC measurements. Several segments of the protein–RNA interface may involve competition between dynamical local substates rather than firmly formed interactions, which is indirectly consistent with the primary NMR data. We demonstrate that the simulations can be used to interpret the NMR atomistic models and can provide qualified predictions. Finally, we propose a protocol for ‘MD-adapted structure ensemble’ as a way to integrate the simulation predictions and expand upon the deposited NMR structures. Unbiased μs-scale atomistic MD could become a technique routinely complementing the NMR measurements of protein–RNA complexes. PMID:27193998
Cheng, Xiang; Zhao, Shu-Guang; Lin, Wei-Zhong; Xiao, Xuan; Chou, Kuo-Chen
2017-11-15
Cells are deemed the basic unit of life. However, many important functions of cells as well as their growth and reproduction are performed via the protein molecules located at their different organelles or locations. Facing explosive growth of protein sequences, we are challenged to develop fast and effective method to annotate their subcellular localization. However, this is by no means an easy task. Particularly, mounting evidences have indicated proteins have multi-label feature meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Unfortunately, most of the existing computational methods can only be used to deal with the single-label proteins. Although the 'iLoc-Animal' predictor developed recently is quite powerful that can be used to deal with the animal proteins with multiple locations as well, its prediction quality needs to be improved, particularly in enhancing the absolute true rate and reducing the absolute false rate. Here we propose a new predictor called 'pLoc-mAnimal', which is superior to iLoc-Animal as shown by the compelling facts. When tested by the most rigorous cross-validation on the same high-quality benchmark dataset, the absolute true success rate achieved by the new predictor is 37% higher and the absolute false rate is four times lower in comparison with the state-of-the-art predictor. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. xxiao@gordonlifescience.org or kcchou@gordonlifescience.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Tollefson, Ann E.; Ying, Baoling; Doronin, Konstantin; Sidor, Peter D.; Wold, William S. M.
2007-01-01
A short open reading frame named the “U exon,” located on the adenovirus (Ad) l-strand (for leftward transcription) between the early E3 region and the fiber gene, is conserved in mastadenoviruses. We have observed that Ad5 mutants with large deletions in E3 that infringe on the U exon display a mild growth defect, as well as an aberrant Ad E2 DNA-binding protein (DBP) intranuclear localization pattern and an apparent failure to organize replication centers during late infection. Mutants in which the U exon DNA is reconstructed have a reversed phenotype. Chow et al. (L. T. Chow et al., J. Mol. Biol. 134:265-303, 1979) described mRNAs initiating in the region of the U exon and spliced to downstream sequences in the late DBP mRNA leader and the DBP-coding region. We have cloned this mRNA (as cDNA) from Ad5 late mRNA; the predicted protein is 217 amino acids, initiating in the U exon and continuing in frame in the DBP leader and in the DBP-coding region but in a different reading frame from DBP. Polyclonal and monoclonal antibodies generated against the predicted U exon protein (UXP) showed that UXP is ∼24K in size by immunoblot and is a late protein. At 18 to 24 h postinfection, UXP is strongly associated with nucleoli and is found throughout the nucleus; later, UXP is associated with the periphery of replication centers, suggesting a function relevant to Ad DNA replication or RNA transcription. UXP is expressed by all four species C Ads. When expressed in transient transfections, UXP complements the aberrant DBP localization pattern of UXP-negative Ad5 mutants. Our data indicate that UXP is a previously unrecognized protein derived from a novel late l-strand transcription unit. PMID:17881437
Jagadish, Nirmala; Rana, Ritu; Selvi, Ramasamy; Mishra, Deepshikha; Garg, Manoj; Yadav, Shikha; Herr, John C.; Okumura, Katsuzumi; Hasegawa, Akiko; Koyama, Koji; Suri, Anil
2005-01-01
We report a novel SPAG9 (sperm-associated antigen 9) protein having structural homology with JNK (c-Jun N-terminal kinase)-interacting protein 3. SPAG9, a single copy gene mapped to the human chromosome 17q21.33 syntenic with location of mouse chromosome 11, was earlier shown to be expressed exclusively in testis [Shankar, Mohapatra and Suri (1998) Biochem. Biophys. Res. Commun. 243, 561–565]. The SPAG9 amino acid sequence analysis revealed identity with the JNK-binding domain and predicted coiled-coil, leucine zipper and transmembrane domains. The secondary structure analysis predicted an α-helical structure for SPAG9 that was confirmed by CD spectra. Microsequencing of higher-order aggregates of recombinant SPAG9 by tandem MS confirmed the amino acid sequence and mono atomic mass of 83.9 kDa. Transient expression of SPAG9 and its deletion mutants revealed that both leucine zipper with extended coiled-coil domains and transmembrane domain of SPAG9 were essential for dimerization and proper localization. Studies of MAPK (mitogenactivated protein kinase) interactions demonstrated that SPAG9 interacted with higher binding affinity to JNK3 and JNK2 compared with JNK1. No interaction was observed with p38α or extracellular-signal-regulated kinase pathways. Polyclonal antibodies raised against recombinant SPAG9 recognized native protein in human sperm extracts and localized specifically on the acrosomal compartment of intact human spermatozoa. Acrosome-reacted spermatozoa demonstrated SPAG9 immunofluorescence, indicating its retention on the equatorial segment after the acrosome reaction. Further, anti-SPAG9 antibodies inhibited the binding of human spermatozoa to intact human oocytes as well as to matched hemizona. This is the first report of sperm-associated JNK-binding protein that may have a role in spermatozoa–egg interaction. PMID:15693750
Jagadish, Nirmala; Rana, Ritu; Selvi, Ramasamy; Mishra, Deepshikha; Garg, Manoj; Yadav, Shikha; Herr, John C; Okumura, Katsuzumi; Hasegawa, Akiko; Koyama, Koji; Suri, Anil
2005-07-01
We report a novel SPAG9 (sperm-associated antigen 9) protein having structural homology with JNK (c-Jun N-terminal kinase)-interacting protein 3. SPAG9, a single copy gene mapped to the human chromosome 17q21.33 syntenic with location of mouse chromosome 11, was earlier shown to be expressed exclusively in testis [Shankar, Mohapatra and Suri (1998) Biochem. Biophys. Res. Commun. 243, 561-565]. The SPAG9 amino acid sequence analysis revealed identity with the JNK-binding domain and predicted coiled-coil, leucine zipper and transmembrane domains. The secondary structure analysis predicted an alpha-helical structure for SPAG9 that was confirmed by CD spectra. Microsequencing of higher-order aggregates of recombinant SPAG9 by tandem MS confirmed the amino acid sequence and mono atomic mass of 83.9 kDa. Transient expression of SPAG9 and its deletion mutants revealed that both leucine zipper with extended coiled-coil domains and transmembrane domain of SPAG9 were essential for dimerization and proper localization. Studies of MAPK (mitogenactivated protein kinase) interactions demonstrated that SPAG9 interacted with higher binding affinity to JNK3 and JNK2 compared with JNK1. No interaction was observed with p38alpha or extracellular-signal-regulated kinase pathways. Polyclonal antibodies raised against recombinant SPAG9 recognized native protein in human sperm extracts and localized specifically on the acrosomal compartment of intact human spermatozoa. Acrosome-reacted spermatozoa demonstrated SPAG9 immunofluorescence, indicating its retention on the equatorial segment after the acrosome reaction. Further, anti-SPAG9 antibodies inhibited the binding of human spermatozoa to intact human oocytes as well as to matched hemizona. This is the first report of sperm-associated JNK-binding protein that may have a role in spermatozoa-egg interaction.
Effect of antimicrobial preservatives on partial protein unfolding and aggregation†
Hutchings, Regina L.; Singh, Surinder M.; Cabello-Villegas, Javier; Mallela, Krishna M. G.
2014-01-01
One-third of protein formulations are multi-dose. These require antimicrobial preservatives (APs); however, some APs have been shown to cause protein aggregation. Our previous work on a model protein cytochrome c indicated that partial protein unfolding, rather than complete unfolding, triggers aggregation. Here, we examined the relative strength of five commonly used APs on such unfolding and aggregation, and explored whether stabilizing the aggregation “hot-spot” reduces such aggregation. All APs induced protein aggregation in the order m-cresol > phenol > benzyl alcohol > phenoxyethanol > chlorobutanol. All these enhanced the partial protein unfolding that includes a local region which was predicted to be the aggregation “hot-spot”. The extent of destabilization correlated with the extent of aggregation. Further, we show that stabilizing the “hot-spot” reduces aggregation induced by all five APs. These results indicate that m-cresol causes the most protein aggregation, whereas chlorobutanol causes the least protein aggregation. The same protein region acts as the “hot-spot” for aggregation induced by different APs, implying that developing strategies to prevent protein aggregation induced by one AP will also work for others. PMID:23169345
Rausch, Felix; Schicht, Martin; Bräuer, Lars; Paulsen, Friedrich; Brandt, Wolfgang
2014-11-01
Surfactant proteins are well known from the human lung where they are responsible for the stability and flexibility of the pulmonary surfactant system. They are able to influence the surface tension of the gas-liquid interface specifically by directly interacting with single lipids. This work describes the generation of reliable protein structure models to support the experimental characterization of two novel putative surfactant proteins called SP-G and SP-H. The obtained protein models were complemented by predicted posttranslational modifications and placed in a lipid model system mimicking the pulmonary surface. Molecular dynamics simulations of these protein-lipid systems showed the stability of the protein models and the formation of interactions between protein surface and lipid head groups on an atomic scale. Thereby, interaction interface and strength seem to be dependent on orientation and posttranslational modification of the protein. The here presented modeling was fundamental for experimental localization studies and the simulations showed that SP-G and SP-H are theoretically able to interact with lipid systems and thus are members of the surfactant protein family.
Doppelt-Azeroual, Olivia; Delfaud, François; Moriaud, Fabrice; de Brevern, Alexandre G
2010-04-01
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Doppelt-Azeroual, Olivia; Delfaud, François; Moriaud, Fabrice; de Brevern, Alexandre G
2010-01-01
Ligand–protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects. PMID:20162627
Water polygons in high-resolution protein crystal structures.
Lee, Jonas; Kim, Sung-Hou
2009-07-01
We have analyzed the interstitial water (ISW) structures in 1500 protein crystal structures deposited in the Protein Data Bank that have greater than 1.5 A resolution with less than 90% sequence similarity with each other. We observed varieties of polygonal water structures composed of three to eight water molecules. These polygons may represent the time- and space-averaged structures of "stable" water oligomers present in liquid water, and their presence as well as relative population may be relevant in understanding physical properties of liquid water at a given temperature. On an average, 13% of ISWs are localized enough to be visible by X-ray diffraction. Of those, averages of 78% are water molecules in the first water layer on the protein surface. Of the localized ISWs beyond the first layer, almost half of them form water polygons such as trigons, tetragons, as well as expected pentagons, hexagons, higher polygons, partial dodecahedrons, and disordered networks. Most of the octagons and nanogons are formed by fusion of smaller polygons. The trigons are most commonly observed. We suggest that our observation provides an experimental basis for including these water polygon structures in correlating and predicting various water properties in liquid state.
Water polygons in high-resolution protein crystal structures
Lee, Jonas; Kim, Sung-Hou
2009-01-01
We have analyzed the interstitial water (ISW) structures in 1500 protein crystal structures deposited in the Protein Data Bank that have greater than 1.5 Å resolution with less than 90% sequence similarity with each other. We observed varieties of polygonal water structures composed of three to eight water molecules. These polygons may represent the time- and space-averaged structures of “stable” water oligomers present in liquid water, and their presence as well as relative population may be relevant in understanding physical properties of liquid water at a given temperature. On an average, 13% of ISWs are localized enough to be visible by X-ray diffraction. Of those, averages of 78% are water molecules in the first water layer on the protein surface. Of the localized ISWs beyond the first layer, almost half of them form water polygons such as trigons, tetragons, as well as expected pentagons, hexagons, higher polygons, partial dodecahedrons, and disordered networks. Most of the octagons and nanogons are formed by fusion of smaller polygons. The trigons are most commonly observed. We suggest that our observation provides an experimental basis for including these water polygon structures in correlating and predicting various water properties in liquid state. PMID:19551896
Predicting the host of influenza viruses based on the word vector.
Xu, Beibei; Tan, Zhiying; Li, Kenli; Jiang, Taijiao; Peng, Yousong
2017-01-01
Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.
Rowe, Casey J; Tang, Fiona; Hughes, Maria Celia B; Rodero, Mathieu P; Malt, Maryrose; Lambie, Duncan; Barbour, Andrew; Hayward, Nicholas K; Smithers, B Mark; Green, Adele C; Khosrotehrani, Kiarash
2016-08-01
Sentinel lymph node status is a major prognostic marker in locally invasive cutaneous melanoma. However, this procedure is not always feasible, requires advanced logistics and carries rare but significant morbidity. Previous studies have linked markers of tumour biology to patient survival. In this study, we aimed to combine the predictive value of established biomarkers in addition to clinical parameters as indicators of survival in addition to or instead of sentinel node biopsy in a cohort of high-risk melanoma patients. Patients with locally invasive melanomas undergoing sentinel lymph node biopsy were ascertained and prospectively followed. Information on mortality was validated through the National Death Index. Immunohistochemistry was used to analyse proteins previously reported to be associated with melanoma survival, namely Ki67, p16 and CD163. Evaluation and multivariate analyses according to REMARK criteria were used to generate models to predict disease-free and melanoma-specific survival. A total of 189 patients with available archival material of their primary tumour were analysed. Our study sample was representative of the entire cohort (N = 559). Average Breslow thickness was 2.5 mm. Thirty-two (17%) patients in the study sample died from melanoma during the follow-up period. A prognostic score was developed and was strongly predictive of survival, independent of sentinel node status. The score allowed classification of risk of melanoma death in sentinel node-negative patients. Combining clinicopathological factors and established biomarkers allows prediction of outcome in locally invasive melanoma and might be implemented in addition to or in cases when sentinel node biopsy cannot be performed. © 2016 UICC.
Conformational responses to changes in the state of ionization of titrable groups in proteins
NASA Astrophysics Data System (ADS)
Richman, Daniel Eric
Electrostatic energy links the structural properties of proteins with some of their important biological functions, including catalysis, energy transduction, and binding and recognition. Accurate calculation of electrostatic energy is essential for predicting and for analyzing function from structure. All proteins have many ionizable residues at the protein-water interface. These groups tend to have ionization equilibria (pK a values) shifted slightly relative to their values in water. In contrast, groups buried in the hydrophobic interior usually have highly anomalous p Ka values. These shifts are what structure-based calculations have to reproduce to allow examination of contributions from electrostatics to stability, solubility and interactions of proteins. Electrostatic energies are challenging to calculate accurately because proteins are heterogeneous dielectric materials. Any individual ionizable group can experience very different local environments with different dielectric properties. The studies in this thesis examine the hypothesis that proteins reorganize concomitant with changes in their state of ionization. It appears that the pKa value measured experimentally reflects the average of pKa values experienced in the different electrostatic environments corresponding to different conformational microstates. Current computational models fail to sample conformational reorganization of the backbone correctly. Staphyloccocal nuclease (SNase) was used as a model protein in nuclear magnetic resonance (NMR) spectroscopy studies to characterize the conformational rearrangements of the protein coupled to changes in the ionization state of titrable groups. One set of experiments tests the hypothesis that proton binding to surface Asp and Glu side chains drives local unfolding by stabilizing less-native, more water-solvated conformations in which the side chains have normalized pKa values. Increased backbone flexibility in the ps-ns timescale, hydrogen bond (H-bond) breaking on at least the mus timescale, and segmental unfolding were detected near titrating groups as pH decreased into the acidic range. The study identified local structural features and stabilities that modulate the magnitude of electrostatic effects. The data demonstrate that computational approaches to pK a calculations for surface groups must account for local fluctuations spanning a wide range of timescales. A comparative NMR spectroscopy study with the L25K and L125K variants of SNase, each with a Lys residue buried in the hydrophobic interior of the protein, determined locations, timescales, and amplitudes of backbone conformational reorganization coupled with ionization of the buried Lys residues. The L25K protein exhibited an ensemble of local fluctuations of the beta barrel in the hundreds of mus timescale and an ensemble of subglobally unfolded beta-barrel states in the hundreds of ms timescale with strong pH dependence. The L125K protein exhibited fluctuations of the helix around site 125 in the mus timescale, with negligible pH dependence. These data illustrate the diverse timescales and local structural properties of conformational reorganization coupled to ionization of buried groups, and the challenge to structure-based electrostatics calculations, which must capture these long-timescale processes.
Moriceau, Lucille; Jomat, Lucile; Bressanelli, Stéphane; Alcaide-Loridan, Catherine; Jupin, Isabelle
2017-01-01
Turnip yellow mosaic virus (TYMV) is a positive-strand RNA virus infecting plants. The TYMV 140K replication protein is a key organizer of viral replication complex (VRC) assembly, being responsible for recruitment of the viral polymerase and for targeting the VRCs to the chloroplast envelope where viral replication takes place. However, the structural requirements determining the subcellular localization and membrane association of this essential viral protein have not yet been defined. In this study, we investigated determinants for the in vivo chloroplast targeting of the TYMV 140K replication protein. Subcellular localization studies of deletion mutants identified a 41-residue internal sequence as the chloroplast targeting domain (CTD) of TYMV 140K; this sequence is sufficient to target GFP to the chloroplast envelope. The CTD appears to be located in the C-terminal extension of the methyltransferase domain—a region shared by 140K and its mature cleavage product 98K, which behaves as an integral membrane protein during infection. We predicted the CTD to fold into two amphipathic α-helices—a folding that was confirmed in vitro by circular dichroism spectroscopy analyses of a synthetic peptide. The importance for subcellular localization of the integrity of these amphipathic helices, and the function of 140K/98K, was demonstrated by performing amino acid substitutions that affected chloroplast targeting, membrane association and viral replication. These results establish a short internal α-helical peptide as an unusual signal for targeting proteins to the chloroplast envelope membrane, and provide new insights into membrane targeting of viral replication proteins—a universal feature of positive-strand RNA viruses. PMID:29312393
Gomase, Virendra S; Chitlange, Nikhilkumar R; Changbhale, Smruti S; Kale, Karbhari V
2013-08-01
Brugia malayi is a threadlike nematode cause's swelling of lymphatic organs, condition well known as lymphatic filariasis; till date no invention made to effectively address lymphatic filariasis. In this analysis we a have predicted suitable antigenic peptides from Brugia malayi antigen protein for peptide vaccine design against lymphatic filariasis based on cross protection phenomenon as, an ample immune response can be generated with a single protein subunit. We found MHC class II binding peptides of Brugia malayi antigen protein are important determinant against the diseased condition. The analysis shows Brugia malayi antigen protein having 505 amino acids, which shows 497 nonamers. In this assay, we have predicted MHC-I binding peptides for 8mer_H2_Db (optimal score- 15.966), 9mer_H2_Db (optimal score- 15.595), 10mer_H2_Db (optimal score- 19.405), 11mer_H2_Dballeles (optimal score- 23.801). We also predicted the SVM based MHCII-IAb nonamers, 51-FQQIDPLDA, 442-FAAIACLVH, 206-YLNPFGHQF, 167-WYVIMAACY, 367-YAMIVIRLL, 434- LVITTAANF, 176-LDSYCLWKP, 435-VITTAANFA, 364-WPGYAMIVI (optimal score- 13.963); MHCII-IAd nonamers, 52-QQIDPLDAE, 171-MAACYLDSY, 239-QWRSVILCN, 168-YVIMAACYL, 3-QYLSVHSLS, 322-EILLHAKVV, 417- LGIIASFVS, 396-KAIFLAHFG, 167-WYVIMAACY, 269-LALHCINVI, 93-FINKAAPKQ, 259-NCIIVLKAF, 79- QGVLLIIPR, 22-TILQRSQAI, 63-RGFVYGNVS, 109-NISSLAFET,(optimal score- 16.748); and MHCII-IAg7 nonamers 171-MAACYLDSY, 73-KIVNGAQGV, 259-NCIIVLKAF, 209-PFGHQFSFE, 102-SCDTLLKNI, 25-QRSQAIRIV, 444- AIACLVHLF, 88-SLVNGFINK, 252-FPRHQLLNC, 471-RFVLANDNE, 52-QQIDPLDAE, 469-HRRFVLAND, 457- SNRHYFLAD, 362-KSWPGYAMI, 476-NDNEGEDFE, 370-IVIRLLQAL (optimal score- 19.847) which represents potential binders from Brugia malayi antigen protein. The method integrates prediction of MHC class I binding proteasomal C-terminal cleavage peptides and Eighteen potential antigenic peptides at average propensity 1.063 having highest local hydrophilicity. Thus a small antigen fragment can induce immune response against whole antigen. This approach can be applied for designing subunit and synthetic peptide vaccines.
Generalized rules for the optimization of elastic network models
NASA Astrophysics Data System (ADS)
Lezon, Timothy; Eyal, Eran; Bahar, Ivet
2009-03-01
Elastic network models (ENMs) are widely employed for approximating the coarse-grained equilibrium dynamics of proteins using only a few parameters. An area of current focus is improving the predictive accuracy of ENMs by fine-tuning their force constants to fit specific systems. Here we introduce a set of general rules for assigning ENM force constants to residue pairs. Using a novel method, we construct ENMs that optimally reproduce experimental residue covariances from NMR models of 68 proteins. We analyze the optimal interactions in terms of amino acid types, pair distances and local protein structures to identify key factors in determining the effective spring constants. When applied to several unrelated globular proteins, our method shows an improved correlation with experiment over a standard ENM. We discuss the physical interpretation of our findings as well as its implications in the fields of protein folding and dynamics.
Computer-based prediction of mitochondria-targeting peptides.
Martelli, Pier Luigi; Savojardo, Castrense; Fariselli, Piero; Tasco, Gianluca; Casadio, Rita
2015-01-01
Computational methods are invaluable when protein sequences, directly derived from genomic data, need functional and structural annotation. Subcellular localization is a feature necessary for understanding the protein role and the compartment where the mature protein is active and very difficult to characterize experimentally. Mitochondrial proteins encoded on the cytosolic ribosomes carry specific patterns in the precursor sequence from where it is possible to recognize a peptide targeting the protein to its final destination. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequences and benchmark our and other methods on the human mitochondrial proteins endowed with experimentally characterized targeting peptides. Furthermore, we illustrate our newly implemented web server and its usage on the whole human proteome in order to infer mitochondrial targeting peptides, their cleavage sites, and whether the targeting peptide regions contain or not arginine-rich recurrent motifs. By this, we add some other 2,800 human proteins to the 124 ones already experimentally annotated with a mitochondrial targeting peptide.
Identifying Unstable Regions of Proteins Involved in Misfolding Diseases
NASA Astrophysics Data System (ADS)
Guest, Will; Cashman, Neil; Plotkin, Steven
2009-05-01
Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and all-atoms molecular dynamics. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.
Protein localization as a principal feature of the etiology and comorbidity of genetic diseases
Park, Solip; Yang, Jae-Seong; Shin, Young-Eun; Park, Juyong; Jang, Sung Key; Kim, Sanguk
2011-01-01
Proteins targeting the same subcellular localization tend to participate in mutual protein–protein interactions (PPIs) and are often functionally associated. Here, we investigated the relationship between disease-associated proteins and their subcellular localizations, based on the assumption that protein pairs associated with phenotypically similar diseases are more likely to be connected via subcellular localization. The spatial constraints from subcellular localization significantly strengthened the disease associations of the proteins connected by subcellular localizations. In particular, certain disease types were more prevalent in specific subcellular localizations. We analyzed the enrichment of disease phenotypes within subcellular localizations, and found that there exists a significant correlation between disease classes and subcellular localizations. Furthermore, we found that two diseases displayed high comorbidity when disease-associated proteins were connected via subcellular localization. We newly explained 7584 disease pairs by using the context of protein subcellular localization, which had not been identified using shared genes or PPIs only. Our result establishes a direct correlation between protein subcellular localization and disease association, and helps to understand the mechanism of human disease progression. PMID:21613983
Bandyopadhyay, Boudhayan; Goldenzweig, Adi; Unger, Tamar; Adato, Orit; Fleishman, Sarel J; Unger, Ron; Horovitz, Amnon
2017-12-15
The GroE chaperonin system in Escherichia coli comprises GroEL and GroES and facilitates ATP-dependent protein folding in vivo and in vitro Proteins with very similar sequences and structures can differ in their dependence on GroEL for efficient folding. One potential but unverified source for GroEL dependence is frustration, wherein not all interactions in the native state are optimized energetically, thereby potentiating slow folding and misfolding. Here, we chose enhanced green fluorescent protein as a model system and subjected it to random mutagenesis, followed by screening for variants whose in vivo folding displays increased or decreased GroEL dependence. We confirmed the altered GroEL dependence of these variants with in vitro folding assays. Strikingly, mutations at positions predicted to be highly frustrated were found to correlate with decreased GroEL dependence. Conversely, mutations at positions with low frustration were found to correlate with increased GroEL dependence. Further support for this finding was obtained by showing that folding of an enhanced green fluorescent protein variant designed computationally to have reduced frustration is indeed less GroEL-dependent. Our results indicate that changes in local frustration also affect partitioning in vivo between spontaneous and chaperonin-mediated folding. Hence, the design of minimally frustrated sequences can reduce chaperonin dependence and improve protein expression levels. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Human stanniocalcin-1 interacts with nuclear and cytoplasmic proteins and acts as a SUMO E3 ligase.
dos Santos, Marcos Tadeu; Trindade, Daniel Maragno; Gonçalves, Kaliandra de Almeida; Bressan, Gustavo Costa; Anastassopoulos, Filipe; Yunes, José Andres; Kobarg, Jörg
2011-01-01
Human stanniocalcin-1 (STC1) is a glycoprotein that has been implicated in different physiological process, including angiogenesis, apoptosis and carcinogenesis. Here we identified STC1 as a putative molecular marker for the leukemic bone marrow microenvironment and identified new interacting protein partners for STC1. Seven selected interactions retrieved from yeast two-hybrid screens were confirmed by GST-pull down assays in vitro. The N-terminal region was mapped to be the region that mediates the interaction with cytoplasmic, mitochondrial and nuclear proteins. STC1 interacts with SUMO-1 and several proteins that have been shown to be SUMOylated and localized to SUMOylation related nuclear bodies. Although STC1 interacts with SUMO-1 and has a high theoretical prediction score for a SUMOylation site, endogenous co-immunoprecipitation and in vitro SUMOylation assays with the purified recombinant protein could not detect STC1 SUMOylation. However, when we tested STC1 for SUMO E3 ligase activity, we found in an in vitro assay, that it significantly increases the SUMOylation of two other proteins. Confocal microscopic subcellular localization studies using both transfected cells and specific antibodies for endogenous STC1 revealed a cytoplasmic and nuclear deposition, the latter in the form of some specific dot-like substructure resembling SUMOylation related nuclear bodies. Together, these findings suggest a new role for STC1 in SUMOylation pathways, in nuclear bodies.
PSOVina: The hybrid particle swarm optimization algorithm for protein-ligand docking.
Ng, Marcus C K; Fong, Simon; Siu, Shirley W I
2015-06-01
Protein-ligand docking is an essential step in modern drug discovery process. The challenge here is to accurately predict and efficiently optimize the position and orientation of ligands in the binding pocket of a target protein. In this paper, we present a new method called PSOVina which combined the particle swarm optimization (PSO) algorithm with the efficient Broyden-Fletcher-Goldfarb-Shannon (BFGS) local search method adopted in AutoDock Vina to tackle the conformational search problem in docking. Using a diverse data set of 201 protein-ligand complexes from the PDBbind database and a full set of ligands and decoys for four representative targets from the directory of useful decoys (DUD) virtual screening data set, we assessed the docking performance of PSOVina in comparison to the original Vina program. Our results showed that PSOVina achieves a remarkable execution time reduction of 51-60% without compromising the prediction accuracies in the docking and virtual screening experiments. This improvement in time efficiency makes PSOVina a better choice of a docking tool in large-scale protein-ligand docking applications. Our work lays the foundation for the future development of swarm-based algorithms in molecular docking programs. PSOVina is freely available to non-commercial users at http://cbbio.cis.umac.mo .
NASA Astrophysics Data System (ADS)
Virrueta, A.; Gaines, J.; O'Hern, C. S.; Regan, L.
2015-03-01
Current research in the O'Hern and Regan laboratories focuses on the development of hard-sphere models with stereochemical constraints for protein structure prediction as an alternative to molecular dynamics methods that utilize knowledge-based corrections in their force-fields. Beginning with simple hydrophobic dipeptides like valine, leucine, and isoleucine, we have shown that our model is able to reproduce the side-chain dihedral angle distributions derived from sets of high-resolution protein crystal structures. However, methionine remains an exception - our model yields a chi-3 side-chain dihedral angle distribution that is relatively uniform from 60 to 300 degrees, while the observed distribution displays peaks at 60, 180, and 300 degrees. Our goal is to resolve this discrepancy by considering clashes with neighboring residues, and averaging the reduced distribution of allowable methionine structures taken from a set of crystallized proteins. We will also re-evaluate the electron density maps from which these protein structures are derived to ensure that the methionines and their local environments are correctly modeled. This work will ultimately serve as a tool for computing side-chain entropy and protein stability. A. V. is supported by an NSF Graduate Research Fellowship and a Ford Foundation Fellowship. J. G. is supported by NIH training Grant NIH-5T15LM007056-28.
Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling
MacDonald, James T.; Kelley, Lawrence A.; Freemont, Paul S.
2013-01-01
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/. PMID:23824634
Conservation of Matrix Attachment Region-Binding Filament-Like Protein 1 among Higher Plants1
Harder, Patricia A.; Silverstein, Rebecca A.; Meier, Iris
2000-01-01
The interaction of chromatin with the nuclear matrix via matrix attachment regions (MARs) on the DNA is considered to be of fundamental importance for higher-order chromatin organization and the regulation of gene expression. We have previously isolated a novel nuclear matrix-localized protein (MFP1) from tomato (Lycopersicon esculentum) that preferentially binds to MAR DNA. Tomato MFP1 has a predicted filament-protein-like structure and is associated with the nuclear envelope via an N-terminal targeting domain. Based on the antigenic relationship, we report here that MFP1 is conserved in a large number of dicot and monocot species. Several cDNAs were cloned from tobacco (Nicotiana tabacum) and shown to correspond to two tobacco MFP1 genes. Comparison of the primary and predicted secondary structures of MFP1 from tomato, tobacco, and Arabidopsis indicates a high degree of conservation of the N-terminal targeting domain, the overall putative coiled-coil structure of the protein, and the C-terminal DNA-binding domain. In addition, we show that tobacco MFP1 is regulated in an organ-specific and developmental fashion, and that this regulation occurs at the level of transcription or RNA stability. PMID:10631266
Girard, Pierre-Marie; Graindorge, Dany; Smirnova, Violetta; Rigolet, Pascal; Francesconi, Stefania; Scanlon, Susan; Sage, Evelyne
2013-01-01
In vertebrates, XRCC3 is one of the five Rad51 paralogs that plays a central role in homologous recombination (HR), a key pathway for maintaining genomic stability. While investigating the potential role of human XRCC3 (hXRCC3) in the inhibition of DNA replication induced by UVA radiation, we discovered that hXRCC3 cysteine residues are oxidized following photosensitization by UVA. Our in silico prediction of the hXRCC3 structure suggests that 6 out of 8 cysteines are potentially accessible to the solvent and therefore potentially exposed to ROS attack. By non-reducing SDS-PAGE we show that many different oxidants induce hXRCC3 oxidation that is monitored in Chinese hamster ovarian (CHO) cells by increased electrophoretic mobility of the protein and in human cells by a slight decrease of its immunodetection. In both cell types, hXRCC3 oxidation was reversed in few minutes by cellular reducing systems. Depletion of intracellular glutathione prevents hXRCC3 oxidation only after UVA exposure though depending on the type of photosensitizer. In addition, we show that hXRCC3 expressed in CHO cells localizes both in the cytoplasm and in the nucleus. Mutating all hXRCC3 cysteines to serines (XR3/S protein) does not affect the subcellular localization of the protein even after exposure to camptothecin (CPT), which typically induces DNA damages that require HR to be repaired. However, cells expressing mutated XR3/S protein are sensitive to CPT, thus highlighting a defect of the mutant protein in HR. In marked contrast to CPT treatment, oxidative stress induces relocalization at the chromatin fraction of both wild-type and mutated protein, even though survival is not affected. Collectively, our results demonstrate that the DNA repair protein hXRCC3 is a target of ROS induced by environmental factors and raise the possibility that the redox environment might participate in regulating the HR pathway. PMID:24116071
Global analyses of Ceratocystis cacaofunesta mitochondria: from genome to proteome.
Ambrosio, Alinne Batista; do Nascimento, Leandro Costa; Oliveira, Bruno V; Teixeira, Paulo José P L; Tiburcio, Ricardo A; Toledo Thomazella, Daniela P; Leme, Adriana F P; Carazzolle, Marcelo F; Vidal, Ramon O; Mieczkowski, Piotr; Meinhardt, Lyndel W; Pereira, Gonçalo A G; Cabrera, Odalys G
2013-02-11
The ascomycete fungus Ceratocystis cacaofunesta is the causal agent of wilt disease in cacao, which results in significant economic losses in the affected producing areas. Despite the economic importance of the Ceratocystis complex of species, no genomic data are available for any of its members. Given that mitochondria play important roles in fungal virulence and the susceptibility/resistance of fungi to fungicides, we performed the first functional analysis of this organelle in Ceratocystis using integrated "omics" approaches. The C. cacaofunesta mitochondrial genome (mtDNA) consists of a single, 103,147-bp circular molecule, making this the second largest mtDNA among the Sordariomycetes. Bioinformatics analysis revealed the presence of 15 conserved genes and 37 intronic open reading frames in C. cacaofunesta mtDNA. Here, we predicted the mitochondrial proteome (mtProt) of C. cacaofunesta, which is comprised of 1,124 polypeptides - 52 proteins that are mitochondrially encoded and 1,072 that are nuclearly encoded. Transcriptome analysis revealed 33 probable novel genes. Comparisons among the Gene Ontology results of the predicted mtProt of C. cacaofunesta, Neurospora crassa and Saccharomyces cerevisiae revealed no significant differences. Moreover, C. cacaofunesta mitochondria were isolated, and the mtProt was subjected to mass spectrometric analysis. The experimental proteome validated 27% of the predicted mtProt. Our results confirmed the existence of 110 hypothetical proteins and 7 novel proteins of which 83 and 1, respectively, had putative mitochondrial localization. The present study provides the first partial genomic analysis of a species of the Ceratocystis genus and the first predicted mitochondrial protein inventory of a phytopathogenic fungus. In addition to the known mitochondrial role in pathogenicity, our results demonstrated that the global function analysis of this organelle is similar in pathogenic and non-pathogenic fungi, suggesting that its relevance in the lifestyle of these organisms should be based on a small number of specific proteins and/or with respect to differential gene regulation. In this regard, particular interest should be directed towards mitochondrial proteins with unknown function and the novel protein that might be specific to this species. Further functional characterization of these proteins could enhance our understanding of the role of mitochondria in phytopathogenicity.
Global analyses of Ceratocystis cacaofunesta mitochondria: from genome to proteome
2013-01-01
Background The ascomycete fungus Ceratocystis cacaofunesta is the causal agent of wilt disease in cacao, which results in significant economic losses in the affected producing areas. Despite the economic importance of the Ceratocystis complex of species, no genomic data are available for any of its members. Given that mitochondria play important roles in fungal virulence and the susceptibility/resistance of fungi to fungicides, we performed the first functional analysis of this organelle in Ceratocystis using integrated “omics” approaches. Results The C. cacaofunesta mitochondrial genome (mtDNA) consists of a single, 103,147-bp circular molecule, making this the second largest mtDNA among the Sordariomycetes. Bioinformatics analysis revealed the presence of 15 conserved genes and 37 intronic open reading frames in C. cacaofunesta mtDNA. Here, we predicted the mitochondrial proteome (mtProt) of C. cacaofunesta, which is comprised of 1,124 polypeptides - 52 proteins that are mitochondrially encoded and 1,072 that are nuclearly encoded. Transcriptome analysis revealed 33 probable novel genes. Comparisons among the Gene Ontology results of the predicted mtProt of C. cacaofunesta, Neurospora crassa and Saccharomyces cerevisiae revealed no significant differences. Moreover, C. cacaofunesta mitochondria were isolated, and the mtProt was subjected to mass spectrometric analysis. The experimental proteome validated 27% of the predicted mtProt. Our results confirmed the existence of 110 hypothetical proteins and 7 novel proteins of which 83 and 1, respectively, had putative mitochondrial localization. Conclusions The present study provides the first partial genomic analysis of a species of the Ceratocystis genus and the first predicted mitochondrial protein inventory of a phytopathogenic fungus. In addition to the known mitochondrial role in pathogenicity, our results demonstrated that the global function analysis of this organelle is similar in pathogenic and non-pathogenic fungi, suggesting that its relevance in the lifestyle of these organisms should be based on a small number of specific proteins and/or with respect to differential gene regulation. In this regard, particular interest should be directed towards mitochondrial proteins with unknown function and the novel protein that might be specific to this species. Further functional characterization of these proteins could enhance our understanding of the role of mitochondria in phytopathogenicity. PMID:23394930
The prediction of palmitoylation site locations using a multiple feature extraction method.
Shi, Shao-Ping; Sun, Xing-Yu; Qiu, Jian-Ding; Suo, Sheng-Bao; Chen, Xiang; Huang, Shu-Yun; Liang, Ru-Ping
2013-03-01
As an extremely important and ubiquitous post-translational lipid modification, palmitoylation plays a significant role in a variety of biological and physiological processes. Unlike other lipid modifications, protein palmitoylation and depalmitoylation are highly dynamic and can regulate both protein function and localization. The dynamic nature of palmitoylation is poorly understood because of the limitations in current assay methods. The in vivo or in vitro experimental identification of palmitoylation sites is both time consuming and expensive. Due to the large volume of protein sequences generated in the post-genomic era, it is extraordinarily important in both basic research and drug discovery to rapidly identify the attributes of a new protein's palmitoylation sites. In this work, a new computational method, WAP-Palm, combining multiple feature extraction, has been developed to predict the palmitoylation sites of proteins. The performance of the WAP-Palm model is measured herein and was found to have a sensitivity of 81.53%, a specificity of 90.45%, an accuracy of 85.99% and a Matthews correlation coefficient of 72.26% in 10-fold cross-validation test. The results obtained from both the cross-validation and independent tests suggest that the WAP-Palm model might facilitate the identification and annotation of protein palmitoylation locations. The online service is available at http://bioinfo.ncu.edu.cn/WAP-Palm.aspx. Copyright © 2013 Elsevier Inc. All rights reserved.
Mariño, Karina; Güther, M. Lucia Sampaio; Wernimont, Amy K.; Qiu, Wei; Hui, Raymond; Ferguson, Michael A. J.
2011-01-01
A gene predicted to encode Trypanosoma brucei glucosamine 6-phosphate N-acetyltransferase (TbGNA1; EC 2.3.1.4) was cloned and expressed in Escherichia coli. The recombinant protein was enzymatically active, and its high-resolution crystal structure was obtained at 1.86 Å. Endogenous TbGNA1 protein was localized to the peroxisome-like microbody, the glycosome. A bloodstream-form T. brucei GNA1 conditional null mutant was constructed and shown to be unable to sustain growth in vitro under nonpermissive conditions, demonstrating that there are no metabolic or nutritional routes to UDP-GlcNAc other than via GlcNAc-6-phosphate. Analysis of the protein glycosylation phenotype of the TbGNA1 mutant under nonpermissive conditions revealed that poly-N-acetyllactosamine structures were greatly reduced in the parasite and that the glycosylation profile of the principal parasite surface coat component, the variant surface glycoprotein (VSG), was modified. The significance of results and the potential of TbGNA1 as a novel drug target for African sleeping sickness are discussed. PMID:21531872
SpidermiR: An R/Bioconductor Package for Integrative Analysis with miRNA Data.
Cava, Claudia; Colaprico, Antonio; Bertoli, Gloria; Graudenzi, Alex; Silva, Tiago C; Olsen, Catharina; Noushmehr, Houtan; Bontempi, Gianluca; Mauri, Giancarlo; Castiglioni, Isabella
2017-01-27
Gene Regulatory Networks (GRNs) control many biological systems, but how such network coordination is shaped is still unknown. GRNs can be subdivided into basic connections that describe how the network members interact e.g., co-expression, physical interaction, co-localization, genetic influence, pathways, and shared protein domains. The important regulatory mechanisms of these networks involve miRNAs. We developed an R/Bioconductor package, namely SpidermiR, which offers an easy access to both GRNs and miRNAs to the end user, and integrates this information with differentially expressed genes obtained from The Cancer Genome Atlas. Specifically, SpidermiR allows the users to: (i) query and download GRNs and miRNAs from validated and predicted repositories; (ii) integrate miRNAs with GRNs in order to obtain miRNA-gene-gene and miRNA-protein-protein interactions, and to analyze miRNA GRNs in order to identify miRNA-gene communities; and (iii) graphically visualize the results of the analyses. These analyses can be performed through a single interface and without the need for any downloads. The full data sets are then rapidly integrated and processed locally.
Maric-Biresev, Jelena; Hunn, Julia P; Krut, Oleg; Helms, J Bernd; Martens, Sascha; Howard, Jonathan C
2016-04-20
The interferon-γ (IFN-γ)-inducible immunity-related GTPase (IRG), Irgm1, plays an essential role in restraining activation of the IRG pathogen resistance system. However, the loss of Irgm1 in mice also causes a dramatic but unexplained susceptibility phenotype upon infection with a variety of pathogens, including many not normally controlled by the IRG system. This phenotype is associated with lymphopenia, hemopoietic collapse, and death of the mouse. We show that the three regulatory IRG proteins (GMS sub-family), including Irgm1, each of which localizes to distinct sets of endocellular membranes, play an important role during the cellular response to IFN-γ, each protecting specific membranes from off-target activation of effector IRG proteins (GKS sub-family). In the absence of Irgm1, which is localized mainly at lysosomal and Golgi membranes, activated GKS proteins load onto lysosomes, and are associated with reduced lysosomal acidity and failure to process autophagosomes. Another GMS protein, Irgm3, is localized to endoplasmic reticulum (ER) membranes; in the Irgm3-deficient mouse, activated GKS proteins are found at the ER. The Irgm3-deficient mouse does not show the drastic phenotype of the Irgm1 mouse. In the Irgm1/Irgm3 double knock-out mouse, activated GKS proteins associate with lipid droplets, but not with lysosomes, and the Irgm1/Irgm3(-/-) does not have the generalized immunodeficiency phenotype expected from its Irgm1 deficiency. The membrane targeting properties of the three GMS proteins to specific endocellular membranes prevent accumulation of activated GKS protein effectors on the corresponding membranes and thus enable GKS proteins to distinguish organellar cellular membranes from the membranes of pathogen vacuoles. Our data suggest that the generalized lymphomyeloid collapse that occurs in Irgm1(-/-) mice upon infection with a variety of pathogens may be due to lysosomal damage caused by off-target activation of GKS proteins on lysosomal membranes and consequent failure of autophagosomal processing.
Hopkins, Julia F.; Spencer, David F.; Laboissiere, Sylvie; Neilson, Jonathan A.D.; Eveleigh, Robert J.M.; Durnford, Dion G.; Gray, Michael W.; Archibald, John M.
2012-01-01
Chlorarachniophytes are unicellular marine algae with plastids (chloroplasts) of secondary endosymbiotic origin. Chlorarachniophyte cells retain the remnant nucleus (nucleomorph) and cytoplasm (periplastidial compartment, PPC) of the green algal endosymbiont from which their plastid was derived. To characterize the diversity of nucleus-encoded proteins targeted to the chlorarachniophyte plastid, nucleomorph, and PPC, we isolated plastid–nucleomorph complexes from the model chlorarachniophyte Bigelowiella natans and subjected them to high-pressure liquid chromatography-tandem mass spectrometry. Our proteomic analysis, the first of its kind for a nucleomorph-bearing alga, resulted in the identification of 324 proteins with 95% confidence. Approximately 50% of these proteins have predicted bipartite leader sequences at their amino termini. Nucleus-encoded proteins make up >90% of the proteins identified. With respect to biological function, plastid-localized light-harvesting proteins were well represented, as were proteins involved in chlorophyll biosynthesis. Phylogenetic analyses revealed that many, but by no means all, of the proteins identified in our proteomic screen are of apparent green algal ancestry, consistent with the inferred evolutionary origin of the plastid and nucleomorph in chlorarachniophytes. PMID:23221610
Localization of spindle checkpoint proteins in cells undergoing mitosis with unreplicated genomes.
Johnson, Mary Kathrine; Cooksey, Amanda M; Wise, Dwayne A
2008-11-01
CHO cells can be arrested with hydoxyurea at the beginning of the DNA synthesis phase of the cell cycle. Subsequent treatment with the xanthine, caffeine, induces cells to bypass the S-phase checkpoint and enter unscheduled mitosis [Schlegel and Pardee,1986, Science 232:1264-1266]. These treated cells build a normal spindle and distribute kinetochores, unattached to chromosomes, to their daughter cells [Brinkley et al.,1988, Nature 336:251-254; Zinkowski et al.,1991, J Cell Biol 113:1091-1110; Wise and Brinkley,1997, Cell Motil Cytoskeleton 36:291-302; Balczon et al.,2003, Chromosoma 112:96-102]. To investigate how these cells distribute kinetochores to daughter cells, we analyzed the spindle checkpoint components, Mad2, CENP-E, and the 3F3 phosphoepitope, using immunofluorescence and digital microscopy. Even though the kinetochores were unpaired and DNA was fragmented, the tension, alignment, and motor components of the checkpoint were found to be present and localized as predicted in prometaphase and metaphase. This unusual mitosis proves that a cell can successfully localize checkpoint proteins and divide even when kinetochores are unpaired and fragmented. (c) 2008 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Scott, Richard; Khan, Faisal M.; Zeineh, Jack; Donovan, Michael; Fernandez, Gerardo
2015-03-01
Immunofluorescent (IF) image analysis of tissue pathology has proven to be extremely valuable and robust in developing prognostic assessments of disease, particularly in prostate cancer. There have been significant advances in the literature in quantitative biomarker expression as well as characterization of glandular architectures in discrete gland rings. However, while biomarker and glandular morphometric features have been combined as separate predictors in multivariate models, there is a lack of integrative features for biomarkers co-localized within specific morphological sub-types; for example the evaluation of androgen receptor (AR) expression within Gleason 3 glands only. In this work we propose a novel framework employing multiple techniques to generate integrated metrics of morphology and biomarker expression. We demonstrate the utility of the approaches in predicting clinical disease progression in images from 326 prostate biopsies and 373 prostatectomies. Our proposed integrative approaches yield significant improvements over existing IF image feature metrics. This work presents some of the first algorithms for generating innovative characteristics in tissue diagnostics that integrate co-localized morphometry and protein biomarker expression.
Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It.
Niehaus, Thomas D; Thamm, Antje M K; de Crécy-Lagard, Valérie; Hanson, Andrew D
2015-11-01
The number of sequenced genomes is rapidly increasing, but functional annotation of the genes in these genomes lags far behind. Even in Arabidopsis (Arabidopsis thaliana), only approximately 40% of enzyme- and transporter-encoding genes have credible functional annotations, and this number is even lower in nonmodel plants. Functional characterization of unknown genes is a challenge, but various databases (e.g. for protein localization and coexpression) can be mined to provide clues. If homologous microbial genes exist-and about one-half the genes encoding unknown enzymes and transporters in Arabidopsis have microbial homologs-cross-kingdom comparative genomics can powerfully complement plant-based data. Multiple lines of evidence can strengthen predictions and warrant experimental characterization. In some cases, relatively quick tests in genetically tractable microbes can determine whether a prediction merits biochemical validation, which is costly and demands specialized skills. © 2015 American Society of Plant Biologists. All Rights Reserved.
Pulver, Rebecca; Heisel, Timothy; Gonia, Sara; Robins, Robert; Norton, Jennifer; Haynes, Paula
2013-01-01
The extremely elongated morphology of fungal hyphae is dependent on the cell's ability to assemble and maintain polarized growth machinery over multiple cell cycles. The different morphologies of the fungus Candida albicans make it an excellent model organism in which to study the spatiotemporal requirements for constitutive polarized growth and the generation of different cell shapes. In C. albicans, deletion of the landmark protein Rsr1 causes defects in morphogenesis that are not predicted from study of the orthologous protein in the related yeast Saccharomyces cerevisiae, thus suggesting that Rsr1 has expanded functions during polarized growth in C. albicans. Here, we show that Rsr1 activity localizes to hyphal tips by the differential localization of the Rsr1 GTPase-activating protein (GAP), Bud2, and guanine nucleotide exchange factor (GEF), Bud5. In addition, we find that Rsr1 is needed to maintain the focused localization of hyphal polarity structures and proteins, including Bem1, a marker of the active GTP-bound form of the Rho GTPase, Cdc42. Further, our results indicate that tip-localized Cdc42 clusters are associated with the cell's ability to express a hyphal transcriptional program and that the ability to generate a focused Cdc42 cluster in early hyphae (germ tubes) is needed to maintain hyphal morphogenesis over time. We propose that in C. albicans, Rsr1 “fine-tunes” the distribution of Cdc42 activity and that self-organizing (Rsr1-independent) mechanisms of polarized growth are not sufficient to generate narrow cell shapes or to provide feedback to the transcriptional program during hyphal morphogenesis. PMID:23223038
Sagar, Mamta; Pandey, Neetesh; Qamar, Naseha; Singh, Brijendra; Shukla, Akanksha
2015-03-01
The long chain fatty acids incorporated into plant lipids are derived from the iterative addition of C2 units which is provided by malonyl-CoA to an acyl-CoA after interactions with 3-ketoacyl-CoA synthase (KCS), found in several plants. This study provides functional characterization of three 3 ketoacyl CoA synthase like proteins in Vitis vinifera (one) and Oryza brachyantha (two proteins). Sequence analysis reveals that protein of Oryza brachyantha shows 96% similarity to a hypothetical protein in Sorghum bicolor; total 11 homologs were predicted in Sorghum bicolor. Conserved domain prediction confirm the presence of FAE1/Type III polyketide synthase-like protein, Thiolase-like, subgroup; Thiolase-like and 3-Oxoacyl-ACP synthase III, C-terminal and chalcone synthase like domain but very long chain 3-keto acyl CoA domain is absent. All three proteins were found to have Chalcone and stilbene synthases C terminal domain which is similar to domain of thiolase and β keto acyl synthase. Its N terminal domain is absent in J3M9Z7 protein of Oryza brachyantha and F6HH63 protein of Vitis vinifera. Differences in N-terminal domain is responsible for distinguish activity. The J3MF16 protein of Oryza brachyantha contains N terminal domain and C terminal domain and characterized using annotation of these domains. Domains Gcs (streptomyces coelicolor) and Chalcone-stilbene synthases (KAS) in 2-pyrone synthase (Gerbera hybrid) and chalcone synthase 2 (Medicago sativa) were found to be present in three proteins. This similarity points toward anthocyanin biosynthetic process. Similarity to chalcone synthase 2 reveals its possible role in Naringenine and Chalcone synthase like activity. In 3 keto acyl CoA synthase of Oryza brachyantha. Active site residues C-240, H-407, N-447 are present in J3MF16 protein that are common in these three protein at different positions. Structural variations among dimer interface, product binding site, malonyl-CoA binding sites, were predicted in localized combination of conserved residues.
An Extended Proteome Map of the Lysosomal Membrane Reveals Novel Potential Transporters*
Chapel, Agnès; Kieffer-Jaquinod, Sylvie; Sagné, Corinne; Verdon, Quentin; Ivaldi, Corinne; Mellal, Mourad; Thirion, Jaqueline; Jadot, Michel; Bruley, Christophe; Garin, Jérôme; Gasnier, Bruno; Journet, Agnès
2013-01-01
Lysosomes are membrane-bound endocytic organelles that play a major role in degrading cell macromolecules and recycling their building blocks. A comprehensive knowledge of the lysosome function requires an extensive description of its content, an issue partially addressed by previous proteomic analyses. However, the proteins underlying many lysosomal membrane functions, including numerous membrane transporters, remain unidentified. We performed a comparative, semi-quantitative proteomic analysis of rat liver lysosome-enriched and lysosome-nonenriched membranes and used spectral counts to evaluate the relative abundance of proteins. Among a total of 2,385 identified proteins, 734 proteins were significantly enriched in the lysosomal fraction, including 207 proteins already known or predicted as endo-lysosomal and 94 proteins without any known or predicted subcellular localization. The remaining 433 proteins had been previously assigned to other subcellular compartments but may in fact reside on lysosomes either predominantly or as a secondary location. Many membrane-associated complexes implicated in diverse processes such as degradation, membrane trafficking, lysosome biogenesis, lysosome acidification, signaling, and nutrient sensing were enriched in the lysosomal fraction. They were identified to an unprecedented extent as most, if not all, of their subunits were found and retained by our screen. Numerous transporters were also identified, including 46 novel potentially lysosomal proteins. We expressed 12 candidates in HeLa cells and observed that most of them colocalized with the lysosomal marker LAMP1, thus confirming their lysosomal residency. This list of candidate lysosomal proteins substantially increases our knowledge of the lysosomal membrane and provides a basis for further characterization of lysosomal functions. PMID:23436907
Cocco, Simona; Monasson, Remi; Weigt, Martin
2013-01-01
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764
Dennison, Jennifer B.; Shahmoradgoli, Maria; Liu, Wenbin; Ju, Zhenlin; Meric-Bernstam, Funda; Perou, Charles M.; Sahin, Aysegul A.; Welm, Alana; Oesterreich, Steffi; Sikora, Matthew J.; Brown, Robert E.; Mills, Gordon B.
2016-01-01
Purpose The current study evaluated associative effects of breast cancer cells with the tumor microenvironment and its influence on tumor behavior. Experimental design Formalin-fixed paraffin embedded tissue and matched protein lysates were evaluated from two independent breast cancer patient data sets (TCGA and MD Anderson). Reverse-phase protein arrays (RPPA) were utilized to create a proteomics signature to define breast tumor subtypes. Expression patterns of cell lines and normal breast tissues were utilized to determine markers that were differentially expressed in stroma and cancer cells. Protein localization and stromal contents were evaluated for matched cases by imaging. Results A subtype of breast cancers designated “Reactive,” previously identified by RPPA that was not predicted by mRNA profiling, was extensively characterized. These tumors were primarily estrogen receptor (ER)-positive/human epidermal growth factor receptor (HER)2-negative, low-risk cancers as determined by enrichment of low-grade nuclei, lobular or tubular histopathology, and the luminal A subtype by PAM50. Reactive breast cancers contained high numbers of stromal cells and the highest extracellular matrix content typically without infiltration of immune cells. For ER-positive/HER2-negative cancers, the Reactive classification predicted favorable clinical outcomes in the TCGA cohort (HR = 0.36, P < 0.05). Conclusions A protein stromal signature in breast cancers is associated with a highly differentiated phenotype. The stromal compartment content and proteins are an extended phenotype not predicted by mRNA expression that could be utilized to sub-classify ER-positive/HER2-negative breast cancers. PMID:27172895
Kumar, Avishek; Campitelli, Paul; Thorpe, M F; Ozkan, S Banu
2015-12-01
The most successful protein structure prediction methods to date have been template-based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug-design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr-REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native-like structures from a template and to provide a set of persistent contacts to be employed during re-folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. © 2015 Wiley Periodicals, Inc.
Palma-Guerrero, Javier; Zhao, Jiuhai; Gonçalves, A. Pedro; Starr, Trevor L.
2015-01-01
The molecular mechanisms of membrane merger during somatic cell fusion in eukaryotic species are poorly understood. In the filamentous fungus Neurospora crassa, somatic cell fusion occurs between genetically identical germinated asexual spores (germlings) and between hyphae to form the interconnected network characteristic of a filamentous fungal colony. In N. crassa, two proteins have been identified to function at the step of membrane fusion during somatic cell fusion: PRM1 and LFD-1. The absence of either one of these two proteins results in an increase of germling pairs arrested during cell fusion with tightly appressed plasma membranes and an increase in the frequency of cell lysis of adhered germlings. The level of cell lysis in ΔPrm1 or Δlfd-1 germlings is dependent on the extracellular calcium concentration. An available transcriptional profile data set was used to identify genes encoding predicted transmembrane proteins that showed reduced expression levels in germlings cultured in the absence of extracellular calcium. From these analyses, we identified a mutant (lfd-2, for late fusion defect-2) that showed a calcium-dependent cell lysis phenotype. lfd-2 encodes a protein with a Fringe domain and showed endoplasmic reticulum and Golgi membrane localization. The deletion of an additional gene predicted to encode a low-affinity calcium transporter, fig1, also resulted in a strain that showed a calcium-dependent cell lysis phenotype. Genetic analyses showed that LFD-2 and FIG1 likely function in separate pathways to regulate aspects of membrane merger and repair during cell fusion. PMID:25595444
DuMond, Jenna F; He, Yi; Burg, Maurice B; Ferraris, Joan D
2015-11-01
Hypertonicity stimulates Nuclear Factor of Activated T-cells 5 (NFAT5) nuclear localization and transactivating activity. Many transcription factors are known to contain intrinsically disordered regions (IDRs) which become more structured with local environmental changes such as osmolality, temperature and tonicity. The transactivating domain of NFAT5 is predicted to be intrinsically disordered under normal tonicity, and under high NaCl, the activity of this domain is increased. To study the binding of co-regulatory proteins at IDRs a cDNA construct expressing the NFAT5 TAD was created and transformed into Escherichia coli cells. Transformed E. coli cells were mass produced by fermentation and extracted by cell lysis to release the NFAT5 TAD. The NFAT5 TAD was subsequently purified using a His-tag column, cation exchange chromatography as well as hydrophobic interaction chromatography and then characterized by mass spectrometry (MS). Published by Elsevier Inc.
GPS-SNO: computational prediction of protein S-nitrosylation sites with a modified GPS algorithm.
Xue, Yu; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Wen, Longping; Yao, Xuebiao; Ren, Jian
2010-06-24
As one of the most important and ubiquitous post-translational modifications (PTMs) of proteins, S-nitrosylation plays important roles in a variety of biological processes, including the regulation of cellular dynamics and plasticity. Identification of S-nitrosylated substrates with their exact sites is crucial for understanding the molecular mechanisms of S-nitrosylation. In contrast with labor-intensive and time-consuming experimental approaches, prediction of S-nitrosylation sites using computational methods could provide convenience and increased speed. In this work, we developed a novel software of GPS-SNO 1.0 for the prediction of S-nitrosylation sites. We greatly improved our previously developed algorithm and released the GPS 3.0 algorithm for GPS-SNO. By comparison, the prediction performance of GPS 3.0 algorithm was better than other methods, with an accuracy of 75.80%, a sensitivity of 53.57% and a specificity of 80.14%. As an application of GPS-SNO 1.0, we predicted putative S-nitrosylation sites for hundreds of potentially S-nitrosylated substrates for which the exact S-nitrosylation sites had not been experimentally determined. In this regard, GPS-SNO 1.0 should prove to be a useful tool for experimentalists. The online service and local packages of GPS-SNO were implemented in JAVA and are freely available at: http://sno.biocuckoo.org/.
A benchmark testing ground for integrating homology modeling and protein docking.
Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima
2017-01-01
Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Xue, Yi; Skrynnikov, Nikolai R
2014-01-01
Currently, the best existing molecular dynamics (MD) force fields cannot accurately reproduce the global free-energy minimum which realizes the experimental protein structure. As a result, long MD trajectories tend to drift away from the starting coordinates (e.g., crystallographic structures). To address this problem, we have devised a new simulation strategy aimed at protein crystals. An MD simulation of protein crystal is essentially an ensemble simulation involving multiple protein molecules in a crystal unit cell (or a block of unit cells). To ensure that average protein coordinates remain correct during the simulation, we introduced crystallography-based restraints into the MD protocol. Because these restraints are aimed at the ensemble-average structure, they have only minimal impact on conformational dynamics of the individual protein molecules. So long as the average structure remains reasonable, the proteins move in a native-like fashion as dictated by the original force field. To validate this approach, we have used the data from solid-state NMR spectroscopy, which is the orthogonal experimental technique uniquely sensitive to protein local dynamics. The new method has been tested on the well-established model protein, ubiquitin. The ensemble-restrained MD simulations produced lower crystallographic R factors than conventional simulations; they also led to more accurate predictions for crystallographic temperature factors, solid-state chemical shifts, and backbone order parameters. The predictions for 15N R1 relaxation rates are at least as accurate as those obtained from conventional simulations. Taken together, these results suggest that the presented trajectories may be among the most realistic protein MD simulations ever reported. In this context, the ensemble restraints based on high-resolution crystallographic data can be viewed as protein-specific empirical corrections to the standard force fields. PMID:24452989
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harima, Yoko, E-mail: harima@takii.kmu.ac.jp; Ikeda, Koshi; Utsunomiya, Keita
Purpose: To determine pretreatment serum protein levels for generally applicable measurement to predict chemoradiation treatment outcomes in patients with locally advanced squamous cell cervical carcinoma (CC). Methods and Materials: In a screening study, measurements were conducted twice. At first, 6 serum samples from CC patients (3 with no evidence of disease [NED] and 3 with cancer-caused death [CD]) and 2 from healthy controls were tested. Next, 12 serum samples from different CC patients (8 NED, 4 CD) and 4 from healthy controls were examined. Subsequently, 28 different CC patients (18 NED, 10 CD) and 9 controls were analyzed in themore » validation study. Protein chips were treated with the sample sera, and the serum protein pattern was detected by surface-enhanced laser desorption and ionization–time-of-flight mass spectrometry (SELDI-TOF MS). Then, single MS-based peptide mass fingerprinting (PMF) and tandem MS (MS/MS)-based peptide/protein identification methods, were used to identify protein corresponding to the detected peak. And then, turbidimetric assay was used to measure the levels of a protein that indicated the best match with this peptide peak. Results: The same peak 8918 m/z was identified in both screening studies. Neither the screening study nor the validation study had significant differences in the appearance of this peak in the controls and NED. However, the intensity of the peak in CD was significantly lower than that of controls and NED in both pilot studies (P=.02, P=.04) and validation study (P=.01, P=.001). The protein indicated the best match with this peptide peak at 8918 m/z was identified as apolipoprotein C-II (ApoC-II) using PMF and MS/MS methods. Turbidimetric assay showed that the mean serum levels of ApoC-II tended to decrease in CD group when compared with NED group (P=.078). Conclusion: ApoC-II could be used as a biomarker for detection in predicting and estimating the radiation treatment outcome of patients with CC.« less
Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong
2014-10-30
Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org. Copyright © 2014 Wiley Periodicals, Inc.
Hsu, Jack C-C; Reid, David W; Hoffman, Alyson M; Sarkar, Devanand; Nicchitta, Christopher V
2018-05-01
Astrocyte elevated gene-1 (AEG-1), an oncogene whose overexpression promotes tumor cell proliferation, angiogenesis, invasion, and enhanced chemoresistance, is thought to function primarily as a scaffolding protein, regulating PI3K/Akt and Wnt/β-catenin signaling pathways. Here we report that AEG-1 is an endoplasmic reticulum (ER) resident integral membrane RNA-binding protein (RBP). Examination of the AEG-1 RNA interactome by HITS-CLIP and PAR-CLIP methodologies revealed a high enrichment for endomembrane organelle-encoding transcripts, most prominently those encoding ER resident proteins, and within this cohort, for integral membrane protein-encoding RNAs. Cluster mapping of the AEG-1/RNA interaction sites demonstrated a normalized rank order interaction of coding sequence >5' untranslated region, with 3' untranslated region interactions only weakly represented. Intriguingly, AEG-1/membrane protein mRNA interaction sites clustered downstream from encoded transmembrane domains, suggestive of a role in membrane protein biogenesis. Secretory and cytosolic protein-encoding mRNAs were also represented in the AEG-1 RNA interactome, with the latter category notably enriched in genes functioning in mRNA localization, translational regulation, and RNA quality control. Bioinformatic analyses of RNA-binding motifs and predicted secondary structure characteristics indicate that AEG-1 lacks established RNA-binding sites though shares the property of high intrinsic disorder commonly seen in RBPs. These data implicate AEG-1 in the localization and regulation of secretory and membrane protein-encoding mRNAs and provide a framework for understanding AEG-1 function in health and disease. © 2018 Hsu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Proteomic Analysis of the Soybean Symbiosome Identifies New Symbiotic Proteins*
Clarke, Victoria C.; Loughlin, Patrick C.; Gavrin, Aleksandr; Chen, Chi; Brear, Ella M.; Day, David A.; Smith, Penelope M.C.
2015-01-01
Legumes form a symbiosis with rhizobia in which the plant provides an energy source to the rhizobia bacteria that it uses to fix atmospheric nitrogen. This nitrogen is provided to the legume plant, allowing it to grow without the addition of nitrogen fertilizer. As part of the symbiosis, the bacteria in the infected cells of a new root organ, the nodule, are surrounded by a plant-derived membrane, the symbiosome membrane, which becomes the interface between the symbionts. Fractions containing the symbiosome membrane (SM) and material from the lumen of the symbiosome (peribacteroid space or PBS) were isolated from soybean root nodules and analyzed using nongel proteomic techniques. Bicarbonate stripping and chloroform-methanol extraction of isolated SM were used to reduce complexity of the samples and enrich for hydrophobic integral membrane proteins. One hundred and ninety-seven proteins were identified as components of the SM, with an additional fifteen proteins identified from peripheral membrane and PBS protein fractions. Proteins involved in a range of cellular processes such as metabolism, protein folding and degradation, membrane trafficking, and solute transport were identified. These included a number of proteins previously localized to the SM, such as aquaglyceroporin nodulin 26, sulfate transporters, remorin, and Rab7 homologs. Among the proteome were a number of putative transporters for compounds such as sulfate, calcium, hydrogen ions, peptide/dicarboxylate, and nitrate, as well as transporters for which the substrate is not easy to predict. Analysis of the promoter activity for six genes encoding putative SM proteins showed nodule specific expression, with five showing expression only in infected cells. Localization of two proteins was confirmed using GFP-fusion experiments. The data have been deposited to the ProteomeXchange with identifier PXD001132. This proteome will provide a rich resource for the study of the legume-rhizobium symbiosis. PMID:25724908
Shen, Hong-Bin; Chou, Kuo-Chen
2007-04-20
Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.
Feedback Interactions of Polymerized Actin with the Cell Membrane: Waves, Pulses, and Oscillations
NASA Astrophysics Data System (ADS)
Carlsson, Anders
Polymerized filaments of the protein actin have crucial functions in cell migration, and in bending the cell membrane to drive endocytosis or the formation of protrusions. The nucleation and polymerization of actin filaments are controlled by upstream agents in the cell membrane, including nucleation-promoting factors (NPFs) that activate the Arp2/3 complex to form new branches on pre-existing filaments. But polymerized actin (F-actin) also feeds back on the assembly of NPFs. We explore the effects of the resulting feedback loop of F-actin and NPFs on two phenomena: actin pulses that drive endocytosis in yeast, and actin waves traveling along the membrane of several cell types. In our model of endocytosis in yeast, the actin network is grown explicitly in three dimensions, exerts a negative feedback interaction on localized patch of NPFs in the membrane, and bends the membrane by exerting a distribution of forces. This model explains observed actin and NPF pulse dynamics, and the effects of several interventions including i) NPF mutations, ii) inhibition of actin polymerization, and iii) deletion of a protein that allows F-actin to bend the cell membrane. The model predicts that mutation of the active region of an NPF will enhance the accumulation of that NPF, and we confirm this prediction by quantitative fluorescence microscopy. For actin waves, we treat a similar model, with NPFs distributed over a larger region of the cell membrane. This model naturally generates actin waves, and predicts a transition from wave behavior to spatially localized oscillations when NPFs are confined to a small region. We also predict a transition from waves to static polarization as the negative-feedback coupling between F-actin and the NPFs is reduced. Supported by NIGMS Grant R01 GM107667.
Lin, Cheng-Yi; Lin, Ching-Yih; Chang, I-Wei; Sheu, Ming-Jen; Li, Chien-Feng; Lee, Sung-Wei; Lin, Li-Ching; Lee, Ying-En; He, Hong-Lin
2015-01-01
Neoadjuvant concurrent chemoradiotherapy (CCRT) followed by surgery is the mainstay of treatment for locally advanced rectal cancer. Several heparin-binding associated proteins have been reported to play a critical role in cancer progression. However, the clinical relevancies of such proteins and their associations with CCRT response in rectal cancer have not yet to be fully elucidated. The analysis of a public transcriptome of rectal cancer indicated that thrombospondin 2 (THBS2) is a predictive factor for CCRT response. Immunohistochemical analyses were conducted to evaluate the expression of THBS2 in pretreatment biopsy specimens from rectal cancer patients without distant metastasis. Furthermore, the relationships between THBS2 expression and various clinicopathological factors or survival were analyzed. Low expression of THBS2 was significantly associated with advanced pretreatment tumor (P<0.001) and nodal status (P=0.004), post-treatment tumor (P<0.001) and nodal status (P<0.001), increased vascular invasion (P=0.003), increased perineural invasion (P=0.023) and inferior tumor regression grade (P=0.015). In univariate analysis, low THBS2 expression predicted worse outcomes for disease-free survival, local recurrence-free survival and metastasis-free survival (all P<0.001). In multivariate analysis, low expression of THBS2 still served as a negative prognostic factor for disease-free survival (Hazard ratio=3.057, P=0.002) and metastasis-free survival (Hazard ratio=3.362, P=0.012). Low THBS2 expression was correlated with advanced disease status and low tumor regression after preoperative CCRT and that it acted as an independent negative prognostic factor in rectal cancer. THBS2 may represent a predictive biomarker for CCRT response in rectal cancer.
Livingston, B T; Shaw, R; Bailey, A; Wilt, F
1991-12-01
In order to investigate the role of proteins in the formation of mineralized tissues during development, we have isolated a cDNA that encodes a protein that is a component of the organic matrix of the skeletal spicule of the sea urchin, Lytechinus pictus. The expression of the RNA encoding this protein is regulated over development and is localized to the descendents of the micromere lineage. Comparison of the sequence of this cDNA to homologous cDNAs from other species of urchin reveal that the protein is basic and contains three conserved structural motifs: a signal peptide, a proline-rich region, and an unusual region composed of a series of direct repeats. Studies on the protein encoded by this cDNA confirm the predicted reading frame deduced from the nucleotide sequence and show that the protein is secreted and not glycosylated. Comparison of the amino acid sequence to databases reveal that the repeat domain is similar to proteins that form a unique beta-spiral supersecondary structure.
De Souza, Colin P.; Hashmi, Shahr B.; Osmani, Aysha H.; Osmani, Stephen A.
2014-01-01
Filamentous fungi occupy critical environmental niches and have numerous beneficial industrial applications but devastating effects as pathogens and agents of food spoilage. As regulators of essentially all biological processes protein kinases have been intensively studied but how they regulate the often unique biology of filamentous fungi is not completely understood. Significant understanding of filamentous fungal biology has come from the study of the model organism Aspergillus nidulans using a combination of molecular genetics, biochemistry, cell biology and genomic approaches. Here we describe dual localization-affinity purification (DLAP) tags enabling endogenous N or C-terminal protein tagging for localization and biochemical studies in A. nidulans. To establish DLAP tag utility we endogenously tagged 17 protein kinases for analysis by live cell imaging and affinity purification. Proteomic analysis of purifications by mass spectrometry confirmed association of the CotA and NimXCdk1 kinases with known binding partners and verified a predicted interaction of the SldABub1/R1 spindle assembly checkpoint kinase with SldBBub3. We demonstrate that the single TOR kinase of A. nidulans locates to vacuoles and vesicles, suggesting that the function of endomembranes as major TOR cellular hubs is conserved in filamentous fungi. Comparative analysis revealed 7 kinases with mitotic specific locations including An-Cdc7 which unexpectedly located to mitotic spindle pole bodies (SPBs), the first such localization described for this family of DNA replication kinases. We show that the SepH septation kinase locates to SPBs specifically in the basal region of apical cells in a biphasic manner during mitosis and again during septation. This results in gradients of SepH between G1 SPBs which shift along hyphae as each septum forms. We propose that SepH regulates the septation initiation network (SIN) specifically at SPBs in the basal region of G1 cells and that localized gradients of SIN activity promote asymmetric septation. PMID:24599037
De Souza, Colin P; Hashmi, Shahr B; Osmani, Aysha H; Osmani, Stephen A
2014-01-01
Filamentous fungi occupy critical environmental niches and have numerous beneficial industrial applications but devastating effects as pathogens and agents of food spoilage. As regulators of essentially all biological processes protein kinases have been intensively studied but how they regulate the often unique biology of filamentous fungi is not completely understood. Significant understanding of filamentous fungal biology has come from the study of the model organism Aspergillus nidulans using a combination of molecular genetics, biochemistry, cell biology and genomic approaches. Here we describe dual localization-affinity purification (DLAP) tags enabling endogenous N or C-terminal protein tagging for localization and biochemical studies in A. nidulans. To establish DLAP tag utility we endogenously tagged 17 protein kinases for analysis by live cell imaging and affinity purification. Proteomic analysis of purifications by mass spectrometry confirmed association of the CotA and NimXCdk1 kinases with known binding partners and verified a predicted interaction of the SldABub1/R1 spindle assembly checkpoint kinase with SldBBub3. We demonstrate that the single TOR kinase of A. nidulans locates to vacuoles and vesicles, suggesting that the function of endomembranes as major TOR cellular hubs is conserved in filamentous fungi. Comparative analysis revealed 7 kinases with mitotic specific locations including An-Cdc7 which unexpectedly located to mitotic spindle pole bodies (SPBs), the first such localization described for this family of DNA replication kinases. We show that the SepH septation kinase locates to SPBs specifically in the basal region of apical cells in a biphasic manner during mitosis and again during septation. This results in gradients of SepH between G1 SPBs which shift along hyphae as each septum forms. We propose that SepH regulates the septation initiation network (SIN) specifically at SPBs in the basal region of G1 cells and that localized gradients of SIN activity promote asymmetric septation.
Panni, Simona; Montecchi-Palazzi, Luisa; Kiemer, Lars; Cabibbo, Andrea; Paoluzi, Serena; Santonico, Elena; Landgraf, Christiane; Volkmer-Engert, Rudolf; Bachi, Angela; Castagnoli, Luisa; Cesareni, Gianni
2011-01-01
Large-scale interaction studies contribute the largest fraction of protein interactions information in databases. However, co-purification of non-specific or indirect ligands, often results in data sets that are affected by a considerable number of false positives. For the fraction of interactions mediated by short linear peptides, we present here a combined experimental and computational strategy for ranking the reliability of the inferred partners. We apply this strategy to the family of 14-3-3 domains. We have first characterized the recognition specificity of this domain family, largely confirming the results of previous analyses, while revealing new features of the preferred sequence context of 14-3-3 phospho-peptide partners. Notably, a proline next to the carboxy side of the phospho-amino acid functions as a potent inhibitor of 14-3-3 binding. The position-specific information about residue preference was encoded in a scoring matrix and two regular expressions. The integration of these three features in a single predictive model outperforms publicly available prediction tools. Next we have combined, by a naïve Bayesian approach, these "peptide features" with "protein features", such as protein co-expression and co-localization. Our approach provides an orthogonal reliability assessment and maps with high confidence the 14-3-3 peptide target on the partner proteins. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Xu, L; Paulsen, J; Yoo, Y; Goodwin, E B; Strome, S
2001-01-01
The maternal-effect sterile (MES) proteins are maternally supplied regulators of germline development in Caenorhabditis elegans. In the hermaphrodite progeny from mes mutant mothers, the germline dies during larval development. On the basis of the similarities of MES-2 and MES-6 to known transcriptional regulators and on the basis of the effects of mes mutations on transgene expression in the germline, the MES proteins are predicted to be transcriptional repressors. One of the MES proteins, MES-3, is a novel protein with no recognizable motifs. In this article we show that MES-3 is localized in the nuclei of embryos and germ cells, consistent with its predicted role in transcriptional regulation. Its distribution in the germline and in early embryos does not depend on the wild-type functions of the other MES proteins. However, its nuclear localization in midstage embryos and its persistence in the primordial germ cells depend on wild-type MES-2 and MES-6. These results are consistent with biochemical data showing that MES-2, MES-3, and MES-6 associate in a complex in embryos. The distribution of MES-3 in the adult germline is regulated by the translational repressor GLD-1: MES-3 is absent from the region of the germline where GLD-1 is known to be present, MES-3 is overexpressed in the germline of gld-1 mutants, and GLD-1 specifically binds the mes-3 3' untranslated region (3' UTR). Analysis of temperature-shifted mes-3(bn21ts) worms and embryos indicates that MES-3 function is required in the mother's germline and during embryogenesis to ensure subsequent normal germline development. We propose that MES-3 acts epigenetically to induce a germline state that is inherited through both meiosis and mitosis and that is essential for survival of the germline. PMID:11729149