Sample records for background predicting protein

  1. A traveling salesman approach for predicting protein functions

    PubMed Central

    Johnson, Olin; Liu, Jing

    2006-01-01

    Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783

  2. Computational prediction of human salivary proteins from blood circulation and application to diagnostic biomarker identification.

    PubMed

    Wang, Jiaxin; Liang, Yanchun; Wang, Yan; Cui, Juan; Liu, Ming; Du, Wei; Xu, Ying

    2013-01-01

    Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer.

  3. Computational Prediction of Human Salivary Proteins from Blood Circulation and Application to Diagnostic Biomarker Identification

    PubMed Central

    Wang, Jiaxin; Liang, Yanchun; Wang, Yan; Cui, Juan; Liu, Ming; Du, Wei; Xu, Ying

    2013-01-01

    Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer. PMID:24324552

  4. SeqRate: sequence-based protein folding type classification and rates prediction

    PubMed Central

    2010-01-01

    Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647

  5. Neighborhood Walkable Urban Form and C-Reactive Protein

    EPA Science Inventory

    Background: Walkable urban form predicts physical activity and lower body mass index, which lower C-reactive protein (CRP). However, urban form is also related to pollution, noise, social and health behavior, crowding, and other stressors, which may complement or contravene walka...

  6. Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

    PubMed Central

    Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret

    2009-01-01

    Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254

  7. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  8. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data

    PubMed Central

    2010-01-01

    Background The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. Results A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. Conclusions We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions. PMID:20959013

  9. Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels

    PubMed Central

    2014-01-01

    Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744

  10. Plasma proteins predict conversion to dementia from prodromal disease

    PubMed Central

    Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L.; Ashton, Nicholas J.; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon

    2014-01-01

    Background The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Methods Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Results Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). Conclusions We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. PMID:25012867

  11. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  12. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483

  13. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  14. Structure prediction of polyglutamine disease proteins: comparison of methods

    PubMed Central

    2014-01-01

    Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases. PMID:25080018

  15. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling

    PubMed Central

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID:24748752

  16. Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures

    PubMed Central

    2010-01-01

    Background β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. Results We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of β-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. Conclusions We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/. PMID:20673368

  17. Enhancing the prediction of protein pairings between interacting families using orthology information

    PubMed Central

    Izarzugaza, Jose MG; Juan, David; Pons, Carles; Pazos, Florencio; Valencia, Alfonso

    2008-01-01

    Background It has repeatedly been shown that interacting protein families tend to have similar phylogenetic trees. These similarities can be used to predicting the mapping between two families of interacting proteins (i.e. which proteins from one family interact with which members of the other). The correct mapping will be that which maximizes the similarity between the trees. The two families may eventually comprise orthologs and paralogs, if members of the two families are present in more than one organism. This fact can be exploited to restrict the possible mappings, simply by impeding links between proteins of different organisms. We present here an algorithm to predict the mapping between families of interacting proteins which is able to incorporate information regarding orthologues, or any other assignment of proteins to "classes" that may restrict possible mappings. Results For the first time in methods for predicting mappings, we have tested this new approach on a large number of interacting protein domains in order to statistically assess its performance. The method accurately predicts around 80% in the most favourable cases. We also analysed in detail the results of the method for a well defined case of interacting families, the sensor and kinase components of the Ntr-type two-component system, for which up to 98% of the pairings predicted by the method were correct. Conclusion Based on the well established relationship between tree similarity and interactions we developed a method for predicting the mapping between two interacting families using genomic information alone. The program is available through a web interface. PMID:18215279

  18. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

    PubMed Central

    2014-01-01

    Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231

  19. Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features

    PubMed Central

    Shi, Xiao-He; Hu, Le-Le; Kong, Xiangyin; Cai, Yu-Dong; Chou, Kuo-Chen

    2010-01-01

    Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging. PMID:20300175

  20. Bacillus anthracis secretome time course under host-simulated conditions and identification of immunogenic proteins

    PubMed Central

    Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G

    2007-01-01

    Background The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. Results A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. Conclusion This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins. PMID:17662140

  1. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  2. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection

    PubMed Central

    2013-01-01

    Background Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. Results We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. Conclusions Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies. PMID:24565053

  3. Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

    PubMed Central

    2010-01-01

    Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603

  4. Computational prediction of hinge axes in proteins

    PubMed Central

    2014-01-01

    Background A protein's function is determined by the wide range of motions exhibited by its 3D structure. However, current experimental techniques are not able to reliably provide the level of detail required for elucidating the exact mechanisms of protein motion essential for effective drug screening and design. Computational tools are instrumental in the study of the underlying structure-function relationship. We focus on a special type of proteins called "hinge proteins" which exhibit a motion that can be interpreted as a rotation of one domain relative to another. Results This work proposes a computational approach that uses the geometric structure of a single conformation to predict the feasible motions of the protein and is founded in recent work from rigidity theory, an area of mathematics that studies flexibility properties of general structures. Given a single conformational state, our analysis predicts a relative axis of motion between two specified domains. We analyze a dataset of 19 structures known to exhibit this hinge-like behavior. For 15, the predicted axis is consistent with a motion to a second, known conformation. We present a detailed case study for three proteins whose dynamics have been well-studied in the literature: calmodulin, the LAO binding protein and the Bence-Jones protein. Conclusions Our results show that incorporating rigidity-theoretic analyses can lead to effective computational methods for understanding hinge motions in macromolecules. This initial investigation is the first step towards a new tool for probing the structure-dynamics relationship in proteins. PMID:25080829

  5. Coiled-coil protein composition of 22 proteomes – differences and common themes in subcellular infrastructure and traffic control

    PubMed Central

    Rose, Annkatrin; Schraegle, Shannon J; Stahlberg, Eric A; Meier, Iris

    2005-01-01

    Background Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Results Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. Conclusion MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell. PMID:16288662

  6. Antibody-protein interactions: benchmark datasets and prediction tools evaluation

    PubMed Central

    Ponomarenko, Julia V; Bourne, Philip E

    2007-01-01

    Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770

  7. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  8. Allergic sensitization: screening methods

    PubMed Central

    2014-01-01

    Experimental in silico, in vitro, and rodent models for screening and predicting protein sensitizing potential are discussed, including whether there is evidence of new sensitizations and allergies since the introduction of genetically modified crops in 1996, the importance of linear versus conformational epitopes, and protein families that become allergens. Some common challenges for predicting protein sensitization are addressed: (a) exposure routes; (b) frequency and dose of exposure; (c) dose-response relationships; (d) role of digestion, food processing, and the food matrix; (e) role of infection; (f) role of the gut microbiota; (g) influence of the structure and physicochemical properties of the protein; and (h) the genetic background and physiology of consumers. The consensus view is that sensitization screening models are not yet validated to definitively predict the de novo sensitizing potential of a novel protein. However, they would be extremely useful in the discovery and research phases of understanding the mechanisms of food allergy development, and may prove fruitful to provide information regarding potential allergenicity risk assessment of future products on a case by case basis. These data and findings were presented at a 2012 international symposium in Prague organized by the Protein Allergenicity Technical Committee of the International Life Sciences Institute’s Health and Environmental Sciences Institute. PMID:24739743

  9. Predicting beta-turns in proteins using support vector machines with fractional polynomials

    PubMed Central

    2013-01-01

    Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438

  10. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  11. Mapping Protein Interactions between Dengue Virus and Its Human and Insect Hosts

    PubMed Central

    Doolittle, Janet M.; Gomez, Shawn M.

    2011-01-01

    Background Dengue fever is an increasingly significant arthropod-borne viral disease, with at least 50 million cases per year worldwide. As with other viral pathogens, dengue virus is dependent on its host to perform the bulk of functions necessary for viral survival and replication. To be successful, dengue must manipulate host cell biological processes towards its own ends, while avoiding elimination by the immune system. Protein-protein interactions between the virus and its host are one avenue through which dengue can connect and exploit these host cellular pathways and processes. Methodology/Principal Findings We implemented a computational approach to predict interactions between Dengue virus (DENV) and both of its hosts, Homo sapiens and the insect vector Aedes aegypti. Our approach is based on structural similarity between DENV and host proteins and incorporates knowledge from the literature to further support a subset of the predictions. We predict over 4,000 interactions between DENV and humans, as well as 176 interactions between DENV and A. aegypti. Additional filtering based on shared Gene Ontology cellular component annotation reduced the number of predictions to approximately 2,000 for humans and 18 for A. aegypti. Of 19 experimentally validated interactions between DENV and humans extracted from the literature, this method was able to predict nearly half (9). Additional predictions suggest specific interactions between virus and host proteins relevant to interferon signaling, transcriptional regulation, stress, and the unfolded protein response. Conclusions/Significance Dengue virus manipulates cellular processes to its advantage through specific interactions with the host's protein interaction network. The interaction networks presented here provide a set of hypothesis for further experimental investigation into the DENV life cycle as well as potential therapeutic targets. PMID:21358811

  12. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions

    PubMed Central

    2014-01-01

    Background H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. Results We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both host and pathogen proteins involved in host-pathogen PPIs tend to have lower charge, and tend to be more hydrophilic. Conclusions Our stringent homology-based prediction approach provides a better strategy in predicting PPIs between eukaryotic hosts and prokaryotic pathogens than a conventional homology-based approach. The properties we have observed from the predicted H. sapiens-M. tuberculosis H37Rv PPI network are useful for understanding inter-species host-pathogen PPI networks and provide novel insights for host-pathogen interaction studies. Reviewers This article was reviewed by Michael Gromiha, Narayanaswamy Srinivasan and Thomas Dandekar. PMID:24708540

  13. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment

    PubMed Central

    2014-01-01

    Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387

  14. Lack of Spartin Protein in Troyer Syndrome

    PubMed Central

    Bakowska, Joanna C.; Wang, Heng; Xin, Baozhong; Sumner, Charlotte J.; Blackstone, Craig

    2017-01-01

    Background Hereditary spastic paraplegias (SPG1-SPG33) are characterized by progressive spastic weakness of the lower limbs. A nucleotide deletion (1110delA) in the (SPG20; OMIM 275900) spartin gene is the origin of autosomal recessive Troyer syndrome. This mutation is predicted to cause premature termination of the spartin protein. However, it remains unknown whether this truncated spartin protein is absent or is present and partially functional in patients. Objective To determine whether the truncated spartin protein is present or absent in cells derived from patients with Troyer syndrome. Design Case report. Setting Academic research. Patients We describe a new family with Troyer syndrome due to the 1110delA mutation. Main Outcome Measures We cultured primary fibroblasts and generated lymphoblasts from affected individuals, carriers, and control subjects and subjected these cells to immunoblot analyses. Results Spartin protein is undetectable in several cell lines derived from patients with Troyer syndrome. Conclusions Our data suggest that Troyer syndrome results from complete loss of spartin protein rather than from the predicted partly functional fragment. This may reflect increased protein degradation or impaired translation. PMID:18413476

  15. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

    PubMed Central

    2013-01-01

    Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620

  16. Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions.

    PubMed

    Zhang, Aidi; He, Libo; Wang, Yaping

    2017-03-02

    Grass carp hemorrhagic disease, caused by grass carp reovirus (GCRV), is the most fatal causative agent in grass carp aquaculture. Protein-protein interactions between virus and host are one avenue through which GCRV can trigger infection and induce disease. Experimental approaches for the detection of host-virus interactome have many inherent limitations, and studies on protein-protein interactions between GCRV and its host remain rare. In this study, based on known motif-domain interaction information, we systematically predicted the GCRV virus-host protein interactome by using motif-domain interaction pair searching strategy. These proteins derived from different domain families and were predicted to interact with different motif patterns in GCRV. JAM-A protein was successfully predicted to interact with motifs of GCRV Sigma1-like protein, and shared the similar binding mode compared with orthoreovirus. Differentially expressed genes during GCRV infection process were extracted and mapped to our predicted interactome, the overlapped genes displayed different tissue expression distributions on the whole, the overall expression level in intestinal is higher than that of other three tissues, which may suggest that the functions of these genes are more active in intestinal. Function annotation and pathway enrichment analysis revealed that the host targets were largely involved in signaling pathway and immune pathway, such as interferon-gamma signaling pathway, VEGF signaling pathway, EGF receptor signaling pathway, B cell activation, and T cell activation. Although the predicted PPIs may contain some false positives due to limited data resource and poor research background in non-model species, the computational method still provide reasonable amount of interactions, which can be further validated by high throughput experiments. The findings of this work will contribute to the development of system biology for GCRV infectious diseases, and help guide the identification of novel receptors of GCRV in its host.

  17. Physicochemical code for quinary protein interactions in Escherichia coli

    PubMed Central

    Mu, Xin; Choi, Seongil; Lang, Lisa; Mowray, David; Danielsson, Jens; Oliveberg, Mikael

    2017-01-01

    How proteins sense and navigate the cellular interior to find their functional partners remains poorly understood. An intriguing aspect of this search is that it relies on diffusive encounters with the crowded cellular background, made up of protein surfaces that are largely nonconserved. The question is then if/how this protein search is amenable to selection and biological control. To shed light on this issue, we examined the motions of three evolutionary divergent proteins in the Escherichia coli cytoplasm by in-cell NMR. The results show that the diffusive in-cell motions, after all, follow simplistic physical−chemical rules: The proteins reveal a common dependence on (i) net charge density, (ii) surface hydrophobicity, and (iii) the electric dipole moment. The bacterial protein is here biased to move relatively freely in the bacterial interior, whereas the human counterparts more easily stick. Even so, the in-cell motions respond predictably to surface mutation, allowing us to tune and intermix the protein’s behavior at will. The findings show how evolution can swiftly optimize the diffuse background of protein encounter complexes by just single-point mutations, and provide a rational framework for adjusting the cytoplasmic motions of individual proteins, e.g., for rescuing poor in-cell NMR signals and for optimizing protein therapeutics. PMID:28536196

  18. Optimizing expression of the pregnancy malaria vaccine candidate, VAR2CSA in Pichia pastoris

    PubMed Central

    Avril, Marion; Hathaway, Marianne J; Cartwright, Megan M; Gose, Severin O; Narum, David L; Smith, Joseph D

    2009-01-01

    Background VAR2CSA is the main candidate for a vaccine against pregnancy-associated malaria, but vaccine development is complicated by the large size and complex disulfide bonding pattern of the protein. Recent X-ray crystallographic information suggests that domain boundaries of VAR2CSA Duffy binding-like (DBL) domains may be larger than previously predicted and include two additional cysteine residues. This study investigated whether longer constructs would improve VAR2CSA recombinant protein secretion from Pichia pastoris and if domain boundaries were applicable across different VAR2CSA alleles. Methods VAR2CSA sequences were bioinformatically analysed to identify the predicted C11 and C12 cysteine residues at the C-termini of DBL domains and revised N- and C-termimal domain boundaries were predicted in VAR2CSA. Multiple construct boundaries were systematically evaluated for protein secretion in P. pastoris and secreted proteins were tested as immunogens. Results From a total of 42 different VAR2CSA constructs, 15 proteins (36%) were secreted. Longer construct boundaries, including the predicted C11 and C12 cysteine residues, generally improved expression of poorly or non-secreted domains and permitted expression of all six VAR2CSA DBL domains. However, protein secretion was still highly empiric and affected by subtle differences in domain boundaries and allelic variation between VAR2CSA sequences. Eleven of the secreted proteins were used to immunize rabbits. Antibodies reacted with CSA-binding infected erythrocytes, indicating that P. pastoris recombinant proteins possessed native protein epitopes. Conclusion These findings strengthen emerging data for a revision of DBL domain boundaries in var-encoded proteins and may facilitate pregnancy malaria vaccine development. PMID:19563628

  19. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

    PubMed Central

    2011-01-01

    Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582

  20. Exploiting protein flexibility to predict the location of allosteric sites

    PubMed Central

    2012-01-01

    Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity), by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing that this simple coarse-grained methodology is able to capture the effects triggered by allosteric ligands already described in the literature. Conclusions We introduce a simple computational approach to predict the presence and position of allosteric sites in a protein based on the analysis of changes in protein normal modes upon the binding of a coarse-grained ligand at predicted cavities. Its performance has been demonstrated using a newly curated non-redundant set of 91 proteins with reported allosteric properties. The software developed in this work is available upon request from the authors. PMID:23095452

  1. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  2. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensitymore » that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting, and reliably estimates believable prediction errors. For the 50% of the real data sets fit well by both methods, spline and logistic predictions are practically indistinguishable, varying in accuracy by less than 15%. The spline method may be useful when automated prediction across simultaneous assays of numerous proteins must be applied routinely with minimal user intervention.« less

  3. High precision multi-genome scale reannotation of enzyme function by EFICAz

    PubMed Central

    Arakaki, Adrian K; Tian, Weidong; Skolnick, Jeffrey

    2006-01-01

    Background The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. Results Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). Conclusion Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction. PMID:17166279

  4. Highly Sensitive Detection of Individual HEAT and ARM Repeats with HHpred and COACH

    PubMed Central

    Kippert, Fred; Gerloff, Dietlind L.

    2009-01-01

    Background HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Methodology and Principal Findings Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. Significance A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains. PMID:19777061

  5. Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

    PubMed Central

    2012-01-01

    Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble. Conclusions The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/. PMID:23282103

  6. Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions

    PubMed Central

    2013-01-01

    Background Subunit vaccines based on recombinant proteins have been effective in preventing infectious diseases and are expected to meet the demands of future vaccine development. Computational approach, especially reverse vaccinology (RV) method has enormous potential for identification of protein vaccine candidates (PVCs) from a proteome. The existing protective antigen prediction software and web servers have low prediction accuracy leading to limited applications for vaccine development. Besides machine learning techniques, those software and web servers have considered only protein’s adhesin-likeliness as criterion for identification of PVCs. Several non-adhesin functional classes of proteins involved in host-pathogen interactions and pathogenesis are known to provide protection against bacterial infections. Therefore, knowledge of bacterial pathogenesis has potential to identify PVCs. Results A web server, Jenner-Predict, has been developed for prediction of PVCs from proteomes of bacterial pathogens. The web server targets host-pathogen interactions and pathogenesis by considering known functional domains from protein classes such as adhesin, virulence, invasin, porin, flagellin, colonization, toxin, choline-binding, penicillin-binding, transferring-binding, fibronectin-binding and solute-binding. It predicts non-cytosolic proteins containing above domains as PVCs. It also provides vaccine potential of PVCs in terms of their possible immunogenicity by comparing with experimentally known IEDB epitopes, absence of autoimmunity and conservation in different strains. Predicted PVCs are prioritized so that only few prospective PVCs could be validated experimentally. The performance of web server was evaluated against known protective antigens from diverse classes of bacteria reported in Protegen database and datasets used for VaxiJen server development. The web server efficiently predicted known vaccine candidates reported from Streptococcus pneumoniae and Escherichia coli proteomes. The Jenner-Predict server outperformed NERVE, Vaxign and VaxiJen methods. It has sensitivity of 0.774 and 0.711 for Protegen and VaxiJen dataset, respectively while specificity of 0.940 has been obtained for the latter dataset. Conclusions Better prediction accuracy of Jenner-Predict web server signifies that domains involved in host-pathogen interactions and pathogenesis are better criteria for prediction of PVCs. The web server has successfully predicted maximum known PVCs belonging to different functional classes. Jenner-Predict server is freely accessible at http://117.211.115.67/vaccine/home.html PMID:23815072

  7. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    PubMed Central

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-01-01

    Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145

  8. Stringent DDI-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions

    PubMed Central

    2013-01-01

    Background H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are very important information to illuminate the infection mechanism of M. tuberculosis H37Rv. But current H. sapiens-M. tuberculosis H37Rv PPI data are very scarce. This seriously limits the study of the interaction between this important pathogen and its host H. sapiens. Computational prediction of H. sapiens-M. tuberculosis H37Rv PPIs is an important strategy to fill in the gap. Domain-domain interaction (DDI) based prediction is one of the frequently used computational approaches in predicting both intra-species and inter-species PPIs. However, the performance of DDI-based host-pathogen PPI prediction has been rather limited. Results We develop a stringent DDI-based prediction approach with emphasis on (i) differences between the specific domain sequences on annotated regions of proteins under the same domain ID and (ii) calculation of the interaction strength of predicted PPIs based on the interacting residues in their interaction interfaces. We compare our stringent DDI-based approach to a conventional DDI-based approach for predicting PPIs based on gold standard intra-species PPIs and coherent informative Gene Ontology terms assessment. The assessment results show that our stringent DDI-based approach achieves much better performance in predicting PPIs than the conventional approach. Using our stringent DDI-based approach, we have predicted a small set of reliable H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. We also analyze the H. sapiens-M. tuberculosis H37Rv PPIs predicted by our stringent DDI-based approach using cellular compartment distribution analysis, functional category enrichment analysis and pathway enrichment analysis. The analyses support the validity of our prediction result. Also, based on an analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent DDI-based approach, we have discovered some important properties of domains involved in host-pathogen PPIs. We find that both host and pathogen proteins involved in host-pathogen PPIs tend to have more domains than proteins involved in intra-species PPIs, and these domains have more interaction partners than domains on proteins involved in intra-species PPI. Conclusions The stringent DDI-based prediction approach reported in this work provides a stringent strategy for predicting host-pathogen PPIs. It also performs better than a conventional DDI-based approach in predicting PPIs. We have predicted a small set of accurate H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. PMID:24564941

  9. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  10. Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses

    PubMed Central

    Hébrard, Eugénie; Bessin, Yannick; Michon, Thierry; Longhi, Sonia; Uversky, Vladimir N; Delalande, François; Van Dorsselaer, Alain; Romero, Pedro; Walter, Jocelyne; Declerk, Nathalie; Fargette, Denis

    2009-01-01

    Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus) and Lettuce mosaic virus (LMV, genus Potyvirus). Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs. PMID:19220875

  11. 7TMRmine: a Web server for hierarchical mining of 7TMR proteins

    PubMed Central

    Lu, Guoqing; Wang, Zhifang; Jones, Alan M; Moriyama, Etsuko N

    2009-01-01

    Background Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. Description We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. Conclusion 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla. PMID:19538753

  12. A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

    PubMed Central

    2013-01-01

    Background An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. Results The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Conclusions Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory. PMID:24180526

  13. Blocking Protein kinase C signaling pathway: mechanistic insights into the anti-leishmanial activity of prospective herbal drugs from Withania somnifera

    PubMed Central

    2012-01-01

    Background Leishmaniasis is caused by several species of leishmania protozoan and is one of the major vector-born diseases after malaria and sleeping sickness. Toxicity of available drugs and drug resistance development by protozoa in recent years has made Leishmaniasis cure difficult and challenging. This urges the need to discover new antileishmanial-drug targets and antileishmanial-drug development. Results Tertiary structure of leishmanial protein kinase C was predicted and found stable with a RMSD of 5.8Å during MD simulations. Natural compound withaferin A inhibited the predicted protein at its active site with -28.47 kcal/mol binding free energy. Withanone was also found to inhibit LPKC with good binding affinity of -22.57 kcal/mol. Both withaferin A and withanone were found stable within the binding pocket of predicted protein when MD simulations of ligand-bound protein complexes were carried out to examine the consistency of interactions between the two. Conclusions Leishmanial protein kinase C (LPKC) has been identified as a potential target to develop drugs against Leishmaniasis. We modelled and refined the tertiary structure of LPKC using computational methods such as homology modelling and molecular dynamics simulations. This structure of LPKC was used to reveal mode of inhibition of two previous experimentally reported natural compounds from Withania somnifera - withaferin A and withanone. PMID:23281834

  14. PharmDock: a pharmacophore-based docking program

    PubMed Central

    2014-01-01

    Background Protein-based pharmacophore models are enriched with the information of potential interactions between ligands and the protein target. We have shown in a previous study that protein-based pharmacophore models can be applied for ligand pose prediction and pose ranking. In this publication, we present a new pharmacophore-based docking program PharmDock that combines pose sampling and ranking based on optimized protein-based pharmacophore models with local optimization using an empirical scoring function. Results Tests of PharmDock on ligand pose prediction, binding affinity estimation, compound ranking and virtual screening yielded comparable or better performance to existing and widely used docking programs. The docking program comes with an easy-to-use GUI within PyMOL. Two features have been incorporated in the program suite that allow for user-defined guidance of the docking process based on previous experimental data. Docking with those features demonstrated superior performance compared to unbiased docking. Conclusion A protein pharmacophore-based docking program, PharmDock, has been made available with a PyMOL plugin. PharmDock and the PyMOL plugin are freely available from http://people.pharmacy.purdue.edu/~mlill/software/pharmdock. PMID:24739488

  15. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae

    PubMed Central

    Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike

    2006-01-01

    Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047

  16. Optimizing physical energy functions for protein folding.

    PubMed

    Fujitsuka, Yoshimi; Takada, Shoji; Luthey-Schulten, Zaida A; Wolynes, Peter G

    2004-01-01

    We optimize a physical energy function for proteins with the use of the available structural database and perform three benchmark tests of the performance: (1) recognition of native structures in the background of predefined decoy sets of Levitt, (2) de novo structure prediction using fragment assembly sampling, and (3) molecular dynamics simulations. The energy parameter optimization is based on the energy landscape theory and uses a Monte Carlo search to find a set of parameters that seeks the largest ratio deltaE(s)/DeltaE for all proteins in a training set simultaneously. Here, deltaE(s) is the stability gap between the native and the average in the denatured states and DeltaE is the energy fluctuation among these states. Some of the energy parameters optimized are found to show significant correlation with experimentally observed quantities: (1) In the recognition test, the optimized function assigns the lowest energy to either the native or a near-native structure among many decoy structures for all the proteins studied. (2) Structure prediction with the fragment assembly sampling gives structure models with root mean square deviation less than 6 A in one of the top five cluster centers for five of six proteins studied. (3) Structure prediction using molecular dynamics simulation gives poorer performance, implying the importance of having a more precise description of local structures. The physical energy function solely inferred from a structural database neither utilizes sequence information from the family of the target nor the outcome of the secondary structure prediction but can produce the correct native fold for many small proteins. Copyright 2003 Wiley-Liss, Inc.

  17. Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data.

    PubMed

    Jefferys, Stuart R; Giddings, Morgan C

    2011-03-15

    Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. Software, demo projects and source can be downloaded from http://pie.giddingslab.org/

  18. Membrane expression of MRP-1, but not MRP-1 splicing or Pgp expression, predicts survival in patients with ESFT

    PubMed Central

    Roundhill, E; Burchill, S

    2013-01-01

    Background: Primary Ewing's sarcoma family of tumours (ESFTs) may respond to chemotherapy, although many patients experience subsequent disease recurrence and relapse. The survival of ESFT cells following chemotherapy has been attributed to the development of resistant disease, possibly through the expression of ABC transporter proteins. Methods: MRP-1 and Pgp mRNA and protein expression in primary ESFTs was determined by quantitative reverse-transcriptase PCR (RT-qPCR) and immunohistochemistry, respectively, and alternative splicing of MRP-1 by RT-PCR. Results: We observed MRP-1 protein expression in 92% (43 out of 47) of primary ESFTs, and cell membrane MRP-1 was highly predictive of both overall survival (P<0.0001) and event-free survival (P<0.0001). Alternative splicing of MRP-1 was detected in primary ESFTs, although the pattern of splicing variants was not predictive of patient outcome, with the exception of loss of exon 9 in six patients, which predicted relapse (P=0.041). Pgp protein was detected in 6% (38 out of 44) of primary ESFTs and was not associated with patient survival. Conclusion: For the first time we have established that cell membrane expression of MRP-1 or loss of exon 9 is predictive of outcome but not the number of splicing events or expression of Pgp, and both may be valuable factors for the stratification of patients for more intensive therapy. PMID:23799853

  19. Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties

    PubMed Central

    Shi, Xiaohe; Lu, Wen-Cong; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Background With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner. Methodology/Principal Findings Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%. Conclusions/Significance The results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem. PMID:21283518

  20. Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

    PubMed Central

    2014-01-01

    Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245

  1. Proteomic analysis of sea urchin (Strongylocentrotus purpuratus) spicule matrix

    PubMed Central

    2010-01-01

    Background The sea urchin embryo has been an important model organism in developmental biology for more than a century. This is due to its relatively simple construction, translucent appearance, and the possibility to follow the fate of individual cells as development to the pluteus larva proceeds. Because the larvae contain tiny calcitic skeletal elements, the spicules, they are also important model organisms for biomineralization research. Similar to other biominerals the spicule contains an organic matrix, which is thought to play an important role in its formation. However, only few spicule matrix proteins were identified previously. Results Using mass spectrometry-based methods we have identified 231 proteins in the matrix of the S. purpuratus spicule matrix. Approximately two thirds of the identified proteins are either known or predicted to be extracellular proteins or transmembrane proteins with large ectodomains. The ectodomains may have been solubilized by partial proteolysis and subsequently integrated into the growing spicule. The most abundant protein of the spicule matrix is SM50. SM50-related proteins, SM30-related proteins, MSP130 and related proteins, matrix metalloproteases and carbonic anhydrase are among the most abundant components. Conclusions The spicule matrix is a relatively complex mixture of proteins not only containing matrix-specific proteins with a function in matrix assembly or mineralization, but also: 1) proteins possibly important for the formation of the continuous membrane delineating the mineralization space; 2) proteins for secretory processes delivering proteinaceous or non-proteinaceous precursors; 3) or proteins reflecting signaling events at the cell/matrix interface. Comparison of the proteomes of different skeletal matrices allows prediction of proteins of general importance for mineralization in sea urchins, such as SM50, SM30-E, SM29 or MSP130. The comparisons also help point out putative tissue-specific proteins, such as tooth phosphodontin or specific spicule matrix metalloproteases of the MMP18/19 group. Furthermore, the direct sequence analysis of peptides by MS/MS validates many predicted genes and confirms the existence of the corresponding proteins. PMID:20565753

  2. Data mining of enzymes using specific peptides

    PubMed Central

    2009-01-01

    Background Predicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is. Results We extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories. Conclusions Employing SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L ≥ 7 has led to highly accurate results. PMID:20034383

  3. Phylogenomic analyses of malaria parasites and evolution of their exported proteins

    PubMed Central

    2011-01-01

    Background Plasmodium falciparum is the most malignant agent of human malaria. It belongs to the taxon Laverania, which includes other ape-infecting Plasmodium species. The origin of the Laverania is still debated. P. falciparum exports pathogenicity-related proteins into the host cell using the Plasmodium export element (PEXEL). Predictions based on the presence of a PEXEL motif suggest that more than 300 proteins are exported by P. falciparum, while there are many fewer exported proteins in non-Laverania. Results A whole-genome approach was applied to resolve the phylogeny of eight Plasmodium species and four outgroup taxa. By using 218 orthologous proteins we received unanimous support for a sister group position of Laverania and avian malaria parasites. This observation was corroborated by the analyses of 28 exported proteins with orthologs present in all Plasmodium species. Most interestingly, several deviations from the P. falciparum PEXEL motif were found to be present in the orthologous sequences of non-Laverania. Conclusion Our phylogenomic analyses strongly support the hypotheses that the Laverania have been founded by a single Plasmodium species switching from birds to African great apes or vice versa. The deviations from the canonical PEXEL motif in orthologs may explain the comparably low number of exported proteins that have been predicted in non-Laverania. PMID:21676252

  4. Network-based prediction and knowledge mining of disease genes

    PubMed Central

    2015-01-01

    Background In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. Methods We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Results Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. Conclusions We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions. PMID:26043920

  5. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  6. Lunch eating predicts weight-loss effectiveness in carriers of the common allele at PERILIPIN1: the ONTIME (Obesity, Nutrigenetics, Timing, Mediterranean) study

    USDA-ARS?s Scientific Manuscript database

    Background: We propose that eating lunch late impairs the mobilization of fat from adipose tissue, particularly in carriers of PERILIPIN1 (PLIN1) variants. Objective: The aim was to test the hypothesis that PLIN1, a circadian lipid-stabilizing protein in the adipocyte, interacts with the timing of ...

  7. OST-HTH: a novel predicted RNA-binding domain

    PubMed Central

    2010-01-01

    Background The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria. Results Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Conclusions Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs. Reviewers This article was reviewed by Sandor Pongor and Arcady Mushegian. PMID:20302647

  8. Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins

    NASA Astrophysics Data System (ADS)

    Basu, Sankar; Söderquist, Fredrik; Wallner, Björn

    2017-05-01

    The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents `Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible `disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.

  9. Penicillin-Binding Protein Transpeptidase Signatures for Tracking and Predicting β-Lactam Resistance Levels in Streptococcus pneumoniae

    PubMed Central

    Metcalf, Benjamin J.; Chochua, Sopio; Li, Zhongya; Gertz, Robert E.; Walker, Hollis; Hawkins, Paulina A.; Tran, Theresa; Whitney, Cynthia G.; McGee, Lesley; Beall, Bernard W.

    2016-01-01

    ABSTRACT β-Lactam antibiotics are the drugs of choice to treat pneumococcal infections. The spread of β-lactam-resistant pneumococci is a major concern in choosing an effective therapy for patients. Systematically tracking β-lactam resistance could benefit disease surveillance. Here we developed a classification system in which a pneumococcal isolate is assigned to a “PBP type” based on sequence signatures in the transpeptidase domains (TPDs) of the three critical penicillin-binding proteins (PBPs), PBP1a, PBP2b, and PBP2x. We identified 307 unique PBP types from 2,528 invasive pneumococcal isolates, which had known MICs to six β-lactams based on broth microdilution. We found that increased β-lactam MICs strongly correlated with PBP types containing divergent TPD sequences. The PBP type explained 94 to 99% of variation in MICs both before and after accounting for genomic backgrounds defined by multilocus sequence typing, indicating that genomic backgrounds made little independent contribution to β-lactam MICs at the population level. We further developed and evaluated predictive models of MICs based on PBP type. Compared to microdilution MICs, MICs predicted by PBP type showed essential agreement (MICs agree within 1 dilution) of >98%, category agreement (interpretive results agree) of >94%, a major discrepancy (sensitive isolate predicted as resistant) rate of <3%, and a very major discrepancy (resistant isolate predicted as sensitive) rate of <2% for all six β-lactams. Thus, the PBP transpeptidase signatures are robust indicators of MICs to different β-lactam antibiotics in clinical pneumococcal isolates and serve as an accurate alternative to phenotypic susceptibility testing. PMID:27302760

  10. Isolation of anticancer drug TAXOL from Pestalotiopsis breviseta with apoptosis and B-Cell lymphoma protein docking studies

    PubMed Central

    Kathiravan, G.; Sureban, Sripathi M.; Sree, Harsha N.; Bhuvaneshwari, V.; Kramony, Evelin

    2012-01-01

    Background: Extraction and investigation of TAXOL from Pestalotiopsis breviseta (Sacc.) using protein docking, which is a computational technique that samples conformations of small molecules in protein-binding sites. Scoring functions are used to assess which of these conformations best complements the protein binding site and active site prediction. Materials and Methods: Coelomycetous fungi P. breviseta (Sacc.) Steyaert was screened for the production of TAXOL, an anticancer drug. Results: TAXOL production was confirmed by the following methods: Ultraviolet (UV) spectroscopic analysis, Infrared analysis, High performance liquid chromatography analysis (HPLC), and Liquid chromatography mass spectrum (LC-MASS). TAXOL produced by the fungi was compared with authentic TAXOL, and protein docking studies were performed. Conclusion: The BCL2 protein of human origin showed a higher affinity toward the compound paclitaxel. It has the binding energy value of −13.0061 (KJ/Mol) with four hydrogen bonds. PMID:24808664

  11. Pathway-Specific Aggregate Biomarker Risk Score Is Associated With Burden of Coronary Artery Disease and Predicts Near-Term Risk of Myocardial Infarction and Death

    PubMed Central

    Ghasemzadeh, Nima; Hayek, Salim S.; Ko, Yi-An; Eapen, Danny J.; Patel, Riyaz S.; Manocha, Pankaj; Kassem, Hatem Al; Khayata, Mohamed; Veledar, Emir; Kremastinos, Dimitrios; Thorball, Christian W.; Pielak, Tomasz; Sikora, Sergey; Zafari, A. Maziar; Lerakis, Stamatios; Sperling, Laurence; Vaccarino, Viola; Epstein, Stephen E.; Quyyumi, Arshed A.

    2018-01-01

    Background Inflammation, coagulation, and cell stress contribute to atherosclerosis and its adverse events. A biomarker risk score (BRS) based on the circulating levels of biomarkers C-reactive protein, fibrin degradation products, and heat shock protein-70 representing these 3 pathways was a strong predictor of future outcomes. We investigated whether soluble urokinase plasminogen activator receptor (suPAR), a marker of immune activation, is predictive of outcomes independent of the aforementioned markers and whether its addition to a 3-BRS improves risk reclassification. Methods and Results C-reactive protein, fibrin degradation product, heat shock protein-70, and suPAR were measured in 3278 patients undergoing coronary angiography. The BRS was calculated by counting the number of biomarkers above a cutoff determined using the Youden’s index. Survival analyses were performed using models adjusted for traditional risk factors. A high suPAR level ≥3.5 ng/mL was associated with all-cause death and myocardial infarction (hazard ratio, 1.83; 95% confidence interval, 1.43–2.35) after adjustment for risk factors, C-reactive protein, fibrin degradation product, and heat shock protein-70. Addition of suPAR to the 3-BRS significantly improved the C statistic, integrated discrimination improvement, and net reclassification index for the primary outcome. A BRS of 1, 2, 3, or 4 was associated with a 1.81-, 2.59-, 6.17-, and 8.80-fold increase, respectively, in the risk of death and myocardial infarction. The 4-BRS was also associated with severity of coronary artery disease and composite end points. Conclusions SuPAR is independently predictive of adverse outcomes, and its addition to a 3-BRS comprising C-reactive protein, fibrin degradation product, and heat shock protein-70 improved risk reclassification. The clinical utility of using a 4-BRS for risk prediction and management of patients with coronary artery disease warrants further study. PMID:28280039

  12. Network inference reveals novel connections in pathways regulating growth and defense in the yeast salt response.

    PubMed

    MacGilvray, Matthew E; Shishkova, Evgenia; Chasman, Deborah; Place, Michael; Gitter, Anthony; Coon, Joshua J; Gasch, Audrey P

    2018-05-01

    Cells respond to stressful conditions by coordinating a complex, multi-faceted response that spans many levels of physiology. Much of the response is coordinated by changes in protein phosphorylation. Although the regulators of transcriptome changes during stress are well characterized in Saccharomyces cerevisiae, the upstream regulatory network controlling protein phosphorylation is less well dissected. Here, we developed a computational approach to infer the signaling network that regulates phosphorylation changes in response to salt stress. We developed an approach to link predicted regulators to groups of likely co-regulated phospho-peptides responding to stress, thereby creating new edges in a background protein interaction network. We then use integer linear programming (ILP) to integrate wild type and mutant phospho-proteomic data and predict the network controlling stress-activated phospho-proteomic changes. The network we inferred predicted new regulatory connections between stress-activated and growth-regulating pathways and suggested mechanisms coordinating metabolism, cell-cycle progression, and growth during stress. We confirmed several network predictions with co-immunoprecipitations coupled with mass-spectrometry protein identification and mutant phospho-proteomic analysis. Results show that the cAMP-phosphodiesterase Pde2 physically interacts with many stress-regulated transcription factors targeted by PKA, and that reduced phosphorylation of those factors during stress requires the Rck2 kinase that we show physically interacts with Pde2. Together, our work shows how a high-quality computational network model can facilitate discovery of new pathway interactions during osmotic stress.

  13. Protein biomarkers in vernix with potential to predict the development of atopic eczema in early childhood

    PubMed Central

    Holm, T; Rutishauser, D; Kai-Larsen, Y; Lyutvinskiy, Y; Stenius, F; Zubarev, R A; Agerberth, B; Alm, J; Scheynius, A

    2014-01-01

    Background Atopic eczema (AE) is a chronic inflammatory skin disease, which has increased in prevalence. Evidence points toward lifestyle as a major risk factor. AE is often the first symptom early in life later followed by food allergy, asthma, and allergic rhinitis. Thus, there is a great need to find early, preferentially noninvasive, biomarkers to identify individuals that are predisposed to AE with the goal to prevent disease development. Objective To investigate whether the protein abundances in vernix can predict later development of AE. Methods Vernix collected at birth from 34 newborns within the Assessment of Lifestyle and Allergic Disease During INfancy (ALADDIN) birth cohort was included in the study. At 2 years of age, 18 children had developed AE. Vernix proteins were identified and quantified with liquid chromatography coupled to tandem mass spectrometry. Results We identified and quantified 203 proteins in all vernix samples. An orthogonal projections to latent structures-discriminant analysis (OPLS-DA) model was found with R2 = 0.85, Q2 = 0.39, and discrimination power between the AE and healthy group of 73.5%. Polyubiquitin-C and calmodulin-like protein 5 showed strong negative correlation to the AE group, with a correlation coefficient of 0.73 and 0.68, respectively, and a P-value of 8.2 E-7 and 1.8 E-5, respectively. For these two proteins, the OPLS-DA model showed a prediction accuracy of 91.2%. Conclusion The protein abundances in vernix, and particularly that of polyubiquitin-C and calmodulin-like protein 5, are promising candidates as biomarkers for the identification of newborns predisposed to develop AE. PMID:24205894

  14. Experimental validation of FINDSITEcomb virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders

    PubMed Central

    2014-01-01

    Background Identification of ligand-protein binding interactions is a critical step in drug discovery. Experimental screening of large chemical libraries, in spite of their specific role and importance in drug discovery, suffer from the disadvantages of being random, time-consuming and expensive. To accelerate the process, traditional structure- or ligand-based VLS approaches are combined with experimental high-throughput screening, HTS. Often a single protein or, at most, a protein family is considered. Large scale VLS benchmarking across diverse protein families is rarely done, and the reported success rate is very low. Here, we demonstrate the experimental HTS validation of a novel VLS approach, FINDSITEcomb, across a diverse set of medically-relevant proteins. Results For eight different proteins belonging to different fold-classes and from diverse organisms, the top 1% of FINDSITEcomb’s VLS predictions were tested, and depending on the protein target, 4%-47% of the predicted ligands were shown to bind with μM or better affinities. In total, 47 small molecule binders were identified. Low nanomolar (nM) binders for dihydrofolate reductase and protein tyrosine phosphatases (PTPs) and micromolar binders for the other proteins were identified. Six novel molecules had cytotoxic activity (<10 μg/ml) against the HCT-116 colon carcinoma cell line and one novel molecule had potent antibacterial activity. Conclusions We show that FINDSITEcomb is a promising new VLS approach that can assist drug discovery. PMID:24936211

  15. Protein-protein docking using region-based 3D Zernike descriptors

    PubMed Central

    2009-01-01

    Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235

  16. Morphine Regulated Synaptic Networks Revealed by Integrated Proteomics and Network Analysis*

    PubMed Central

    Stockton, Steven D.; Gomes, Ivone; Liu, Tong; Moraje, Chandrakala; Hipólito, Lucia; Jones, Matthew R.; Ma'ayan, Avi; Morón, Jose A.; Li, Hong; Devi, Lakshmi A.

    2015-01-01

    Despite its efficacy, the use of morphine for the treatment of chronic pain remains limited because of the rapid development of tolerance, dependence and ultimately addiction. These undesired effects are thought to be because of alterations in synaptic transmission and neuroplasticity within the reward circuitry including the striatum. In this study we used subcellular fractionation and quantitative proteomics combined with computational approaches to investigate the morphine-induced protein profile changes at the striatal postsynaptic density. Over 2,600 proteins were identified by mass spectrometry analysis of subcellular fractions enriched in postsynaptic density associated proteins from saline or morphine-treated striata. Among these, the levels of 34 proteins were differentially altered in response to morphine. These include proteins involved in G-protein coupled receptor signaling, regulation of transcription and translation, chaperones, and protein degradation pathways. The altered expression levels of several of these proteins was validated by Western blotting analysis. Using Genes2Fans software suite we connected the differentially expressed proteins with proteins identified within the known background protein-protein interaction network. This led to the generation of a network consisting of 116 proteins with 40 significant intermediates. To validate this, we confirmed the presence of three proteins predicted to be significant intermediates: caspase-3, receptor-interacting serine/threonine protein kinase 3 and NEDD4 (an E3-ubiquitin ligase identified as a neural precursor cell expressed developmentally down-regulated protein 4). Because this morphine-regulated network predicted alterations in proteasomal degradation, we examined the global ubiquitination state of postsynaptic density proteins and found it to be substantially altered. Together, these findings suggest a role for protein degradation and for the ubiquitin/proteasomal system in the etiology of opiate dependence and addiction. PMID:26149443

  17. Transcriptomic analysis of Arabidopsis developing stems: a close-up on cell wall genes

    PubMed Central

    Minic, Zoran; Jamet, Elisabeth; San-Clemente, Hélène; Pelletier, Sandra; Renou, Jean-Pierre; Rihouey, Christophe; Okinyo, Denis PO; Proux, Caroline; Lerouge, Patrice; Jouanin, Lise

    2009-01-01

    Background Different strategies (genetics, biochemistry, and proteomics) can be used to study proteins involved in cell biogenesis. The availability of the complete sequences of several plant genomes allowed the development of transcriptomic studies. Although the expression patterns of some Arabidopsis thaliana genes involved in cell wall biogenesis were identified at different physiological stages, detailed microarray analysis of plant cell wall genes has not been performed on any plant tissues. Using transcriptomic and bioinformatic tools, we studied the regulation of cell wall genes in Arabidopsis stems, i.e. genes encoding proteins involved in cell wall biogenesis and genes encoding secreted proteins. Results Transcriptomic analyses of stems were performed at three different developmental stages, i.e., young stems, intermediate stage, and mature stems. Many genes involved in the synthesis of cell wall components such as polysaccharides and monolignols were identified. A total of 345 genes encoding predicted secreted proteins with moderate or high level of transcripts were analyzed in details. The encoded proteins were distributed into 8 classes, based on the presence of predicted functional domains. Proteins acting on carbohydrates and proteins of unknown function constituted the two most abundant classes. Other proteins were proteases, oxido-reductases, proteins with interacting domains, proteins involved in signalling, and structural proteins. Particularly high levels of expression were established for genes encoding pectin methylesterases, germin-like proteins, arabinogalactan proteins, fasciclin-like arabinogalactan proteins, and structural proteins. Finally, the results of this transcriptomic analyses were compared with those obtained through a cell wall proteomic analysis from the same material. Only a small proportion of genes identified by previous proteomic analyses were identified by transcriptomics. Conversely, only a few proteins encoded by genes having moderate or high level of transcripts were identified by proteomics. Conclusion Analysis of the genes predicted to encode cell wall proteins revealed that about 345 genes had moderate or high levels of transcripts. Among them, we identified many new genes possibly involved in cell wall biogenesis. The discrepancies observed between results of this transcriptomic study and a previous proteomic study on the same material revealed post-transcriptional mechanisms of regulation of expression of genes encoding cell wall proteins. PMID:19149885

  18. Binding Site and Affinity Prediction of General Anesthetics to Protein Targets Using Docking

    PubMed Central

    Liu, Renyu; Perez-Aguilar, Jose Manuel; Liang, David; Saven, Jeffery G.

    2012-01-01

    Background The protein targets for general anesthetics remain unclear. A tool to predict anesthetic binding for potential binding targets is needed. In this study, we explore whether a computational method, AutoDock, could serve as such a tool. Methods High-resolution crystal data of water soluble proteins (cytochrome C, apoferritin and human serum albumin), and a membrane protein (a pentameric ligand-gated ion channel from Gloeobacter violaceus, GLIC) were used. Isothermal titration calorimetry (ITC) experiments were performed to determine anesthetic affinity in solution conditions for apoferritin. Docking calculations were performed using DockingServer with the Lamarckian genetic algorithm and the Solis and Wets local search method (https://www.dockingserver.com/web). Twenty general anesthetics were docked into apoferritin. The predicted binding constants are compared with those obtained from ITC experiments for potential correlations. In the case of apoferritin, details of the binding site and their interactions were compared with recent co-crystallization data. Docking calculations for six general anesthetics currently used in clinical settings (isoflurane, sevoflurane, desflurane, halothane, propofol, and etomidate) with known EC50 were also performed in all tested proteins. The binding constants derived from docking experiments were compared with known EC50s and octanol/water partition coefficients for the six general anesthetics. Results All 20 general anesthetics docked unambiguously into the anesthetic binding site identified in the crystal structure of apoferritin. The binding constants for 20 anesthetics obtained from the docking calculations correlate significantly with those obtained from ITC experiments (p=0.04). In the case of GLIC, the identified anesthetic binding sites in the crystal structure are among the docking predicted binding sites, but not the top ranked site. Docking calculations suggest a most probable binding site located in the extracellular domain of GLIC. The predicted affinities correlated significantly with the known EC50s for the six commonly used anesthetics in GLIC for the site identified in the experimental crystal data (p=0.006). However, predicted affinities in apoferritin, human serum albumin, and cytochrome C did not correlate with these six anesthetics’ known experimental EC50s. A weak correlation between the predicted affinities and the octanol/water partition coefficients was observed for the sites in GLIC. Conclusion We demonstrated that anesthetic binding sites and relative affinities can be predicted using docking calculations in an automatic docking server (Autodock) for both water soluble and membrane proteins. Correlation of predicted affinity and EC50 for six commonly used general anesthetics was only observed in GLIC, a member of a protein family relevant to anesthetic mechanism. PMID:22392968

  19. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

    PubMed Central

    Odronitz, Florian; Kollmar, Martin

    2006-01-01

    Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497

  20. Selection of organisms for the co-evolution-based study of protein interactions

    PubMed Central

    2011-01-01

    Background The prediction and study of protein interactions and functional relationships based on similarity of phylogenetic trees, exemplified by the mirrortree and related methodologies, is being widely used. Although dependence between the performance of these methods and the set of organisms used to build the trees was suspected, so far nobody assessed it in an exhaustive way, and, in general, previous works used as many organisms as possible. In this work we asses the effect of using different sets of organism (chosen according with various phylogenetic criteria) on the performance of this methodology in detecting protein interactions of different nature. Results We show that the performance of three mirrortree-related methodologies depends on the set of organisms used for building the trees, and it is not always directly related to the number of organisms in a simple way. Certain subsets of organisms seem to be more suitable for the predictions of certain types of interactions. This relationship between type of interaction and optimal set of organism for detecting them makes sense in the light of the phylogenetic distribution of the organisms and the nature of the interactions. Conclusions In order to obtain an optimal performance when predicting protein interactions, it is recommended to use different sets of organisms depending on the available computational resources and data, as well as the type of interactions of interest. PMID:21910884

  1. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  2. 3D Protein structure prediction with genetic tabu search algorithm

    PubMed Central

    2010-01-01

    Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256

  3. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  4. Protein disorder in the human diseasome: unfoldomics of human genetic diseases

    PubMed Central

    Midic, Uros; Oldfield, Christopher J; Dunker, A Keith; Obradovic, Zoran; Uversky, Vladimir N

    2009-01-01

    Background Intrinsically disordered proteins lack stable structure under physiological conditions, yet carry out many crucial biological functions, especially functions associated with regulation, recognition, signaling and control. Recently, human genetic diseases and related genes were organized into a bipartite graph (Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685–8690). This diseasome network revealed several significant features such as the common genetic origin of many diseases. Methods and findings We analyzed the abundance of intrinsic disorder in these diseasome network proteins by means of several prediction algorithms, and we analyzed the functional repertoires of these proteins based on prior studies relating disorder to function. Our analyses revealed that (i) Intrinsic disorder is common in proteins associated with many human genetic diseases; (ii) Different disease classes vary in the IDP contents of their associated proteins; (iii) Molecular recognition features, which are relatively short loosely structured protein regions within mostly disordered sequences and which gain structure upon binding to partners, are common in the diseasome, and their abundance correlates with the intrinsic disorder level; (iv) Some disease classes have a significant fraction of genes affected by alternative splicing, and the alternatively spliced regions in the corresponding proteins are predicted to be highly disordered; and (v) Correlations were found among the various diseasome graph-related properties and intrinsic disorder. Conclusion These observations provide the basis for the construction of the human-genetic-disease-associated unfoldome. PMID:19594871

  5. Integrating water exclusion theory into βcontacts to predict binding free energy changes and binding hot spots

    PubMed Central

    2014-01-01

    Background Binding free energy and binding hot spots at protein-protein interfaces are two important research areas for understanding protein interactions. Computational methods have been developed previously for accurate prediction of binding free energy change upon mutation for interfacial residues. However, a large number of interrupted and unimportant atomic contacts are used in the training phase which caused accuracy loss. Results This work proposes a new method, βACV ASA , to predict the change of binding free energy after alanine mutations. βACV ASA integrates accessible surface area (ASA) and our newly defined β contacts together into an atomic contact vector (ACV). A β contact between two atoms is a direct contact without being interrupted by any other atom between them. A β contact’s potential contribution to protein binding is also supposed to be inversely proportional to its ASA to follow the water exclusion hypothesis of binding hot spots. Tested on a dataset of 396 alanine mutations, our method is found to be superior in classification performance to many other methods, including Robetta, FoldX, HotPOINT, an ACV method of β contacts without ASA integration, and ACV ASA methods (similar to βACV ASA but based on distance-cutoff contacts). Based on our data analysis and results, we can draw conclusions that: (i) our method is powerful in the prediction of binding free energy change after alanine mutation; (ii) β contacts are better than distance-cutoff contacts for modeling the well-organized protein-binding interfaces; (iii) β contacts usually are only a small fraction number of the distance-based contacts; and (iv) water exclusion is a necessary condition for a residue to become a binding hot spot. Conclusions βACV ASA is designed using the advantages of both β contacts and water exclusion. It is an excellent tool to predict binding free energy changes and binding hot spots after alanine mutation. PMID:24568581

  6. Local functional descriptors for surface comparison based binding prediction

    PubMed Central

    2012-01-01

    Background Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to predict interactions without regard to the underlying structural scaffold that creates the surface. Results We present a new descriptor for protein functional surfaces and algorithms for using these descriptors to compare protein surface regions to identify ligand binding interfaces. Our approach uses descriptors of local regions of the surface, and assembles collections of matches to compare larger regions. Our approach uses a variety of physical, chemical, and geometric properties, adaptively weighting these properties as appropriate for different regions of the interface. Our approach builds a classifier based on a training corpus of examples of binding sites of the target ligand. The constructed classifiers can be applied to a query protein providing a probability for each position on the protein that the position is part of a binding interface. We demonstrate the effectiveness of the approach on a number of benchmarks, demonstrating performance that is comparable to the state-of-the-art, with an approach with more generality than these prior methods. Conclusions Local functional descriptors offer a new method for protein surface comparison that is sufficiently flexible to serve in a variety of applications. PMID:23176080

  7. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study

    PubMed Central

    2012-01-01

    Background There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. Results The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. Conclusions In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners. PMID:22783946

  8. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    PubMed Central

    Zheng, Ce; Kurgan, Lukasz

    2008-01-01

    Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492

  9. Consistency of gene starts among Burkholderia genomes

    PubMed Central

    2011-01-01

    Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528

  10. Color transitions in coral's fluorescent proteins by site-directed mutagenesis

    PubMed Central

    Gurskaya, Nadya G; Savitsky, Alexander P; Yanushevich, Yurii G; Lukyanov, Sergey A; Lukyanov, Konstantin A

    2001-01-01

    Background Green Fluorescent Protein (GFP) cloned from jellyfish Aequorea victoria and its homologs from corals Anthozoa have a great practical significance as in vivo markers of gene expression. Also, they are an interesting puzzle of protein science due to an unusual mechanism of chromophore formation and diversity of fluorescent colors. Fluorescent proteins can be subdivided into cyan (~ 485 nm), green (~ 505 nm), yellow (~ 540 nm), and red (>580 nm) emitters. Results Here we applied site-directed mutagenesis in order to investigate the structural background of color variety and possibility of shifting between different types of fluorescence. First, a blue-shifted mutant of cyan amFP486 was generated. Second, it was established that cyan and green emitters can be modified so as to produce an intermediate spectrum of fluorescence. Third, the relationship between green and yellow fluorescence was inspected on closely homologous green zFP506 and yellow zFP538 proteins. The following transitions of colors were performed: yellow to green; yellow to dual color (green and yellow); and green to yellow. Fourth, we generated a mutant of cyan emitter dsFP483 that demonstrated dual color (cyan and red) fluorescence. Conclusions Several amino acid substitutions were found to strongly affect fluorescence maxima. Some positions primarily found by sequence comparison were proved to be crucial for fluorescence of particular color. These results are the first step towards predicting the color of natural GFP-like proteins corresponding to newly identified cDNAs from corals. PMID:11459517

  11. Developing a scalable model of recombinant protein yield from Pichia pastoris: the influence of culture conditions, biomass and induction regime

    PubMed Central

    Holmes, William J; Darby, Richard AJ; Wilks, Martin DB; Smith, Rodney; Bill, Roslyn M

    2009-01-01

    Background The optimisation and scale-up of process conditions leading to high yields of recombinant proteins is an enduring bottleneck in the post-genomic sciences. Typical experiments rely on varying selected parameters through repeated rounds of trial-and-error optimisation. To rationalise this, several groups have recently adopted the 'design of experiments' (DoE) approach frequently used in industry. Studies have focused on parameters such as medium composition, nutrient feed rates and induction of expression in shake flasks or bioreactors, as well as oxygen transfer rates in micro-well plates. In this study we wanted to generate a predictive model that described small-scale screens and to test its scalability to bioreactors. Results Here we demonstrate how the use of a DoE approach in a multi-well mini-bioreactor permitted the rapid establishment of high yielding production phase conditions that could be transferred to a 7 L bioreactor. Using green fluorescent protein secreted from Pichia pastoris, we derived a predictive model of protein yield as a function of the three most commonly-varied process parameters: temperature, pH and the percentage of dissolved oxygen in the culture medium. Importantly, when yield was normalised to culture volume and density, the model was scalable from mL to L working volumes. By increasing pre-induction biomass accumulation, model-predicted yields were further improved. Yield improvement was most significant, however, on varying the fed-batch induction regime to minimise methanol accumulation so that the productivity of the culture increased throughout the whole induction period. These findings suggest the importance of matching the rate of protein production with the host metabolism. Conclusion We demonstrate how a rational, stepwise approach to recombinant protein production screens can reduce process development time. PMID:19570229

  12. Global analyses of Ceratocystis cacaofunesta mitochondria: from genome to proteome

    PubMed Central

    2013-01-01

    Background The ascomycete fungus Ceratocystis cacaofunesta is the causal agent of wilt disease in cacao, which results in significant economic losses in the affected producing areas. Despite the economic importance of the Ceratocystis complex of species, no genomic data are available for any of its members. Given that mitochondria play important roles in fungal virulence and the susceptibility/resistance of fungi to fungicides, we performed the first functional analysis of this organelle in Ceratocystis using integrated “omics” approaches. Results The C. cacaofunesta mitochondrial genome (mtDNA) consists of a single, 103,147-bp circular molecule, making this the second largest mtDNA among the Sordariomycetes. Bioinformatics analysis revealed the presence of 15 conserved genes and 37 intronic open reading frames in C. cacaofunesta mtDNA. Here, we predicted the mitochondrial proteome (mtProt) of C. cacaofunesta, which is comprised of 1,124 polypeptides - 52 proteins that are mitochondrially encoded and 1,072 that are nuclearly encoded. Transcriptome analysis revealed 33 probable novel genes. Comparisons among the Gene Ontology results of the predicted mtProt of C. cacaofunesta, Neurospora crassa and Saccharomyces cerevisiae revealed no significant differences. Moreover, C. cacaofunesta mitochondria were isolated, and the mtProt was subjected to mass spectrometric analysis. The experimental proteome validated 27% of the predicted mtProt. Our results confirmed the existence of 110 hypothetical proteins and 7 novel proteins of which 83 and 1, respectively, had putative mitochondrial localization. Conclusions The present study provides the first partial genomic analysis of a species of the Ceratocystis genus and the first predicted mitochondrial protein inventory of a phytopathogenic fungus. In addition to the known mitochondrial role in pathogenicity, our results demonstrated that the global function analysis of this organelle is similar in pathogenic and non-pathogenic fungi, suggesting that its relevance in the lifestyle of these organisms should be based on a small number of specific proteins and/or with respect to differential gene regulation. In this regard, particular interest should be directed towards mitochondrial proteins with unknown function and the novel protein that might be specific to this species. Further functional characterization of these proteins could enhance our understanding of the role of mitochondria in phytopathogenicity. PMID:23394930

  13. Identification and characterisation of the angiotensin converting enzyme-3 (ACE3) gene: a novel mammalian homologue of ACE

    PubMed Central

    Rella, Monika; Elliot, Joann L; Revett, Timothy J; Lanfear, Jerry; Phelan, Anne; Jackson, Richard M; Turner, Anthony J; Hooper, Nigel M

    2007-01-01

    Background Mammalian angiotensin converting enzyme (ACE) plays a key role in blood pressure regulation. Although multiple ACE-like proteins exist in non-mammalian organisms, to date only one other ACE homologue, ACE2, has been identified in mammals. Results Here we report the identification and characterisation of the gene encoding a third homologue of ACE, termed ACE3, in several mammalian genomes. The ACE3 gene is located on the same chromosome downstream of the ACE gene. Multiple sequence alignment and molecular modelling have been employed to characterise the predicted ACE3 protein. In mouse, rat, cow and dog, the predicted protein has mutations in some of the critical residues involved in catalysis, including the catalytic Glu in the HEXXH zinc binding motif which is Gln, and ESTs or reverse-transcription PCR indicate that the gene is expressed. In humans, the predicted ACE3 protein has an intact HEXXH motif, but there are other deletions and insertions in the gene and no ESTs have been identified. Conclusion In the genomes of several mammalian species there is a gene that encodes a novel, single domain ACE-like protein, ACE3. In mouse, rat, cow and dog ACE3, the catalytic Glu is replaced by Gln in the putative zinc binding motif, indicating that in these species ACE3 would lack catalytic activity as a zinc metalloprotease. In humans, no evidence was found that the ACE3 gene is expressed and the presence of deletions and insertions in the sequence indicate that ACE3 is a pseudogene. PMID:17597519

  14. A Mixed QM/MM Scoring Function to Predict Protein-Ligand Binding Affinity

    PubMed Central

    Hayik, Seth A.; Dunbrack, Roland; Merz, Kenneth M.

    2010-01-01

    Computational methods for predicting protein-ligand binding free energy continue to be popular as a potential cost-cutting method in the drug discovery process. However, accurate predictions are often difficult to make as estimates must be made for certain electronic and entropic terms in conventional force field based scoring functions. Mixed quantum mechanics/molecular mechanics (QM/MM) methods allow electronic effects for a small region of the protein to be calculated, treating the remaining atoms as a fixed charge background for the active site. Such a semi-empirical QM/MM scoring function has been implemented in AMBER using DivCon and tested on a set of 23 metalloprotein-ligand complexes, where QM/MM methods provide a particular advantage in the modeling of the metal ion. The binding affinity of this set of proteins can be calculated with an R2 of 0.64 and a standard deviation of 1.88 kcal/mol without fitting and 0.71 and a standard deviation of 1.69 kcal/mol with fitted weighting of the individual scoring terms. In this study we explore using various methods to calculate terms in the binding free energy equation, including entropy estimates and minimization standards. From these studies we found that using the rotational bond estimate to ligand entropy results in a reasonable R2 of 0.63 without fitting. We also found that using the ESCF energy of the proteins without minimization resulted in an R2 of 0.57, when using the rotatable bond entropy estimate. PMID:21221417

  15. Learning a peptide-protein binding affinity predictor with kernel ridge regression

    PubMed Central

    2013-01-01

    Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081

  16. Pooled Protein Immunization for Identification of Cell Surface Antigens in Streptococcus sanguinis

    PubMed Central

    Ge, Xiuchun; Kitten, Todd; Munro, Cindy L.; Conrad, Daniel H.; Xu, Ping

    2010-01-01

    Background Available bacterial genomes provide opportunities for screening vaccines by reverse vaccinology. Efficient identification of surface antigens is required to reduce time and animal cost in this technology. We developed an approach to identify surface antigens rapidly in Streptococcus sanguinis, a common infective endocarditis causative species. Methods and Findings We applied bioinformatics for antigen prediction and pooled antigens for immunization. Forty-seven surface-exposed proteins including 28 lipoproteins and 19 cell wall-anchored proteins were chosen based on computer algorithms and comparative genomic analyses. Eight proteins among these candidates and 2 other proteins were pooled together to immunize rabbits. The antiserum reacted strongly with each protein and with S. sanguinis whole cells. Affinity chromatography was used to purify the antibodies to 9 of the antigen pool components. Competitive ELISA and FACS results indicated that these 9 proteins were exposed on S. sanguinis cell surfaces. The purified antibodies had demonstrable opsonic activity. Conclusions The results indicate that immunization with pooled proteins, in combination with affinity purification, and comprehensive immunological assays may facilitate cell surface antigen identification to combat infectious diseases. PMID:20668678

  17. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    PubMed Central

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026

  18. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252

  19. Recombinant Expression Screening of P. aeruginosa Bacterial Inner Membrane Proteins

    PubMed Central

    2010-01-01

    Background Transmembrane proteins (TM proteins) make up 25% of all proteins and play key roles in many diseases and normal physiological processes. However, much less is known about their structures and molecular mechanisms than for soluble proteins. Problems in expression, solubilization, purification, and crystallization cause bottlenecks in the characterization of TM proteins. This project addressed the need for improved methods for obtaining sufficient amounts of TM proteins for determining their structures and molecular mechanisms. Results Plasmid clones were obtained that encode eighty-seven transmembrane proteins with varying physical characteristics, for example, the number of predicted transmembrane helices, molecular weight, and grand average hydrophobicity (GRAVY). All the target proteins were from P. aeruginosa, a gram negative bacterial opportunistic pathogen that causes serious lung infections in people with cystic fibrosis. The relative expression levels of the transmembrane proteins were measured under several culture growth conditions. The use of E. coli strains, a T7 promoter, and a 6-histidine C-terminal affinity tag resulted in the expression of 61 out of 87 test proteins (70%). In this study, proteins with a higher grand average hydrophobicity and more transmembrane helices were expressed less well than less hydrophobic proteins with fewer transmembrane helices. Conclusions In this study, factors related to overall hydrophobicity and the number of predicted transmembrane helices correlated with the relative expression levels of the target proteins. Identifying physical characteristics that correlate with protein expression might aid in selecting the "low hanging fruit", or proteins that can be expressed to sufficient levels using an E. coli expression system. The use of other expression strategies or host species might be needed for sufficient levels of expression of transmembrane proteins with other physical characteristics. Surveys like this one could aid in overcoming the technical bottlenecks in working with TM proteins and could potentially aid in increasing the rate of structure determination. PMID:21114855

  20. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

    PubMed Central

    2013-01-01

    Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

  1. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391

  2. Protein intakes are associated with reduced length of stay: a comparison between Enhanced Recovery After Surgery (ERAS) and conventional care after elective colorectal surgery.

    PubMed

    Yeung, Sophia E; Hilkewich, Leslee; Gillis, Chelsia; Heine, John A; Fenton, Tanis R

    2017-07-01

    Background: Protein can modulate the surgical stress response and postoperative catabolism. Enhanced Recovery After Surgery (ERAS) protocols are evidence-based care bundles that reduce morbidity. Objective: In this study, we compared protein adequacy as well as energy intakes, gut function, clinical outcomes, and how well nutritional variables predict length of hospital stay (LOS) in patients receiving ERAS protocols and conventional care. Design: We conducted a prospective cohort study in adult elective colorectal resection patients after conventional ( n = 46) and ERAS ( n = 69) care. Data collected included preoperative Malnutrition Screening Tool (MST) score, 3-d food records, postoperative nausea, LOS, and complications. Multivariable regression analysis assessed whether low protein intakes and the MST score were predictive of LOS. Results: Total protein intakes were significantly higher in the ERAS group due to the inclusion of oral nutrition supplements (conventional group: 0.33 g · kg -1 · d -1 ; ERAS group: 0.54 g · kg -1 · d -1 ; P < 0.02). This group difference in protein intake was maintained in a multivariable model that controlled for differences between baseline and surgical variables ( P = 0.001). Oral food intake did not differ between the 2 groups. The ERAS group had shorter LOS ( P = 0.049) and fewer total infectious complications ( P = 0.01). Nausea was a predictor of protein intake. Nutrition variables were independent predictors of earlier discharge after potential confounders were controlled for. Each unit increase in preoperative MST score predicted longer LOSs of 2.5 d (95% CI: 1.5, 3.5 d; P < 0.001), and the consumption of ≥60% of protein requirements during the first 3 d of hospitalization was associated with a shorter LOS of 4.4 d (95% CI: -6.8, -2.0 d; P < 0.001). Conclusions: ERAS patients consumed more protein due to the inclusion of oral nutrition supplements. However, total protein intake remained inadequate to meet recommendations. Consumption of ≥60% protein needs after surgery and MST scores were independent predictors of LOS. This trial was registered at clinicaltrials.gov as NCT02940665. © 2017 American Society for Nutrition.

  3. Mapping proteins in the presence of paralogs using units of coevolution

    PubMed Central

    2013-01-01

    Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible. PMID:24564758

  4. Stress induction in the bacteria Shewanella oneidensis and Deinococcus radiodurans in response to below-background ionizing radiation.

    PubMed

    Castillo, Hugo; Schoderbek, Donald; Dulal, Santosh; Escobar, Gabriela; Wood, Jeffrey; Nelson, Roger; Smith, Geoffrey

    2015-01-01

    The 'Linear no-threshold' (LNT) model predicts that any amount of radiation increases the risk of organisms to accumulate negative effects. Several studies at below background radiation levels (4.5-11.4 nGy h(-1)) show decreased growth rates and an increased susceptibility to oxidative stress. The purpose of our study is to obtain molecular evidence of a stress response in Shewanella oneidensis and Deinococcus radiodurans grown at a gamma dose rate of 0.16 nGy h(-1), about 400 times less than normal background radiation. Bacteria cultures were grown at a dose rate of 0.16 or 71.3 nGy h(-1) gamma irradiation. Total RNA was extracted from samples at early-exponential and stationary phases for the rt-PCR relative quantification (radiation-deprived treatment/background radiation control) of the stress-related genes katB (catalase), recA (recombinase), oxyR (oxidative stress transcriptional regulator), lexA (SOS regulon transcriptional repressor), dnaK (heat shock protein 70) and SOA0154 (putative heavy metal efflux pump). Deprivation of normal levels of radiation caused a reduction in growth of both bacterial species, accompanied by the upregulation of katB, recA, SOA0154 genes in S. oneidensis and the upregulation of dnaK in D. radiodurans. When cells were returned to background radiation levels, growth rates recovered and the stress response dissipated. Our results indicate that below-background levels of radiation inhibited growth and elicited a stress response in two species of bacteria, contrary to the LNT model prediction.

  5. Designing proteins to combat disease: Cardiac troponin C as an example.

    PubMed

    Davis, Jonathan P; Shettigar, Vikram; Tikunova, Svetlana B; Little, Sean C; Liu, Bin; Siddiqui, Jalal K; Janssen, Paul M L; Ziolo, Mark T; Walton, Shane D

    2016-07-01

    Throughout history, muscle research has led to numerous scientific breakthroughs that have brought insight to a more general understanding of all biological processes. Potentially one of the most influential discoveries was the role of the second messenger calcium and its myriad of handling and sensing systems that mechanistically control muscle contraction. In this review we will briefly discuss the significance of calcium as a universal second messenger along with some of the most common calcium binding motifs in proteins, focusing on the EF-hand. We will also describe some of our approaches to rationally design calcium binding proteins to palliate, or potentially even cure cardiovascular disease. Considering not all failing hearts have the same etiology, genetic background and co-morbidities, personalized therapies will need to be developed. We predict designer proteins will open doors for unprecedented personalized, and potentially, even generalized medicines as gene therapy or protein delivery techniques come to fruition. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. ArfB links protein lipidation and endocytosis to polarized growth of Aspergillus nidulans

    PubMed Central

    Lee, Soo Chan

    2008-01-01

    Aspergillus nidulans undergoes polarized hyphal growth during the majority of its life cycle. Regulatory mechanisms for hyphal polarity have been intensively investigated in a variety of filamentous fungi. Two important cellular processes, which have received recent attention, include protein myristoylation and endocytosis. It is clear that protein myristoylation is essential for polarity establishment because germinating A. nidulans conidia lost polarity in the presence of cerulenin, a lipid metabolism inhibitor and in an N-myristoyl transferase mutant background. Only 41 predicted proteins encoded by A. nidulans posses an N-myristoylation motif, one of which is ADP ribosylation factor B (ArfB). Disruption of ArfB leads to failure of polarity establishment and maintenance during early morphogenesis and in a delay in endocytosis. Therefore, ArfB connects N-myristoylation and endocytosis to polarized growth. Exocytotic vesicle trafficking through the Spitzenkörper may also require Arf proteins in their role in vesicle formation. Taken together, ArfB is one of the important key components for the fungal hyphal growth. PMID:19704790

  7. GM organisms and the EU regulatory environment: allergenicity as a risk component.

    PubMed

    Davies, Howard V

    2005-11-01

    The European Food Safety Authority, following a request from the European Commission, has published a guidance document for the risk assessment of GM plants and derived food and feed to assist in the implementation of provisions of Regulation (EC) 1829/2003 of the European Parliament and Council on GM food and feed. This regulation has applied since 18 April 2004. In principle, hazard identification and characterisation of GM crops is conducted in four steps: characterisation of the parent crop and any hazards associated with it; characterisation of the transformation process and of the inserted recombinant DNA, including an assessment of the possible production of new fusion proteins or allergens; assessment of the introduced proteins (toxicity, allergenicity) and metabolites; identification of any other targetted and unexpected alterations in the GM crop, including changes in the plant metabolism resulting in compositional changes and assessment of their toxicological, allergenic or nutritional impact. In relation to allergenicity specifically, it is clear that this property of a given protein is not intrinsic and fully predictable but is a biological activity requiring an interaction with individuals with a predisposed genetic background. Allergenicity, therefore, depends on the genetic diversity and variability in atopic human subjects. Given this lack of complete predictability it is necessary to obtain, from several steps in the risk-assessment process, a cumulative body of evidence that minimises any uncertainty about the protein(s) in question.

  8. Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency

    PubMed Central

    2013-01-01

    Background The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. Methods We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Results Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. Conclusions A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD. PMID:24289245

  9. Development of background-free tame fluorescent probes for intracellular live cell imaging

    PubMed Central

    Alamudi, Samira Husen; Satapathy, Rudrakanta; Kim, Jihyo; Su, Dongdong; Ren, Haiyan; Das, Rajkumar; Hu, Lingna; Alvarado-Martínez, Enrique; Lee, Jung Yeol; Hoppmann, Christian; Peña-Cabrera, Eduardo; Ha, Hyung-Ho; Park, Hee-Sung; Wang, Lei; Chang, Young-Tae

    2016-01-01

    Fluorescence labelling of an intracellular biomolecule in native living cells is a powerful strategy to achieve in-depth understanding of the biomolecule's roles and functions. Besides being nontoxic and specific, desirable labelling probes should be highly cell permeable without nonspecific interactions with other cellular components to warrant high signal-to-noise ratio. While it is critical, rational design for such probes is tricky. Here we report the first predictive model for cell permeable background-free probe development through optimized lipophilicity, water solubility and charged van der Waals surface area. The model was developed by utilizing high-throughput screening in combination with cheminformatics. We demonstrate its reliability by developing CO-1 and AzG-1, a cyclooctyne- and azide-containing BODIPY probe, respectively, which specifically label intracellular target organelles and engineered proteins with minimum background. The results provide an efficient strategy for development of background-free probes, referred to as ‘tame' probes, and novel tools for live cell intracellular imaging. PMID:27321135

  10. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives. PMID:23889801

  11. HMPAS: Human Membrane Protein Analysis System

    PubMed Central

    2013-01-01

    Background Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups. Methods We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences. Results We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins. Conclusions HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas. PMID:24564858

  12. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070

  13. Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties

    PubMed Central

    2011-01-01

    Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences. PMID:21342579

  14. Protein structure prediction with local adjust tabu search algorithm

    PubMed Central

    2014-01-01

    Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708

  15. Protein instability, haploinsufficiency, and cortical hyper-excitability underlie STXBP1 encephalopathy

    PubMed Central

    Kovačević, Jovana; Maroteaux, Gregoire; Schut, Desiree; Loos, Maarten; Dubey, Mohit; Pitsch, Julika; Remmelink, Esther; Koopmans, Bastijn; Crowley, James; Cornelisse, L Niels; Sullivan, Patrick F; Schoch, Susanne; Toonen, Ruud F; Stiedl, Oliver; Verhage, Matthijs

    2018-01-01

    Abstract De novo heterozygous mutations in STXBP1/Munc18-1 cause early infantile epileptic encephalopathies (EIEE4, OMIM #612164) characterized by infantile epilepsy, developmental delay, intellectual disability, and can include autistic features. We characterized the cellular deficits for an allelic series of seven STXBP1 mutations and developed four mouse models that recapitulate the abnormal EEG activity and cognitive aspects of human STXBP1-encephalopathy. Disease-causing STXBP1 variants supported synaptic transmission to a variable extent on a null background, but had no effect when overexpressed on a heterozygous background. All disease variants had severely decreased protein levels. Together, these cellular studies suggest that impaired protein stability and STXBP1 haploinsufficiency explain STXBP1-encephalopathy and that, therefore, Stxbp1+/− mice provide a valid mouse model. Simultaneous video and EEG recordings revealed that Stxbp1+/− mice with different genomic backgrounds recapitulate the seizure/spasm phenotype observed in humans, characterized by myoclonic jerks and spike-wave discharges that were suppressed by the antiepileptic drug levetiracetam. Mice heterozygous for Stxbp1 in GABAergic neurons only, showed impaired viability, 50% died within 2–3 weeks, and the rest showed stronger epileptic activity. c-Fos staining implicated neocortical areas, but not other brain regions, as the seizure foci. Stxbp1+/− mice showed impaired cognitive performance, hyperactivity and anxiety-like behaviour, without altered social behaviour. Taken together, these data demonstrate the construct, face and predictive validity of Stxbp1+/− mice and point to protein instability, haploinsufficiency and imbalanced excitation in neocortex, as the underlying mechanism of STXBP1-encephalopathy. The mouse models reported here are valid models for development of therapeutic interventions targeting STXBP1-encephalopathy. PMID:29538625

  16. Comparative genomic analysis reveals a novel mitochondrial isoform of human rTS protein and unusual phylogenetic distribution of the rTS gene

    PubMed Central

    Liang, Ping; Nair, Jayakumar R; Song, Lei; McGuire, John J; Dolnick, Bruce J

    2005-01-01

    Background The rTS gene (ENOSF1), first identified in Homo sapiens as a gene complementary to the thymidylate synthase (TYMS) mRNA, is known to encode two protein isoforms, rTSα and rTSβ. The rTSβ isoform appears to be an enzyme responsible for the synthesis of signaling molecules involved in the down-regulation of thymidylate synthase, but the exact cellular functions of rTS genes are largely unknown. Results Through comparative genomic sequence analysis, we predicted the existence of a novel protein isoform, rTS, which has a 27 residue longer N-terminus by virtue of utilizing an alternative start codon located upstream of the start codon in rTSβ. We observed that a similar extended N-terminus could be predicted in all rTS genes for which genomic sequences are available and the extended regions are conserved from bacteria to human. Therefore, we reasoned that the protein with the extended N-terminus might represent an ancestral form of the rTS protein. Sequence analysis strongly predicts a mitochondrial signal sequence in the extended N-terminal of human rTSγ, which is absent in rTSβ. We confirmed the existence of rTS in human mitochondria experimentally by demonstrating the presence of both rTSγ and rTSβ proteins in mitochondria isolated by subcellular fractionation. In addition, our comprehensive analysis of rTS orthologous sequences reveals an unusual phylogenetic distribution of this gene, which suggests the occurrence of one or more horizontal gene transfer events. Conclusion The presence of two rTS isoforms in mitochondria suggests that the rTS signaling pathway may be active within mitochondria. Our report also presents an example of identifying novel protein isoforms and for improving gene annotation through comparative genomic analysis. PMID:16162288

  17. A Library of Plasmodium vivax Recombinant Merozoite Proteins Reveals New Vaccine Candidates and Protein-Protein Interactions

    PubMed Central

    Hostetler, Jessica B.; Sharma, Sumana; Bartholdson, S. Josefin; Wright, Gavin J.; Fairhurst, Rick M.; Rayner, Julian C.

    2015-01-01

    Background A vaccine targeting Plasmodium vivax will be an essential component of any comprehensive malaria elimination program, but major gaps in our understanding of P. vivax biology, including the protein-protein interactions that mediate merozoite invasion of reticulocytes, hinder the search for candidate antigens. Only one ligand-receptor interaction has been identified, that between P. vivax Duffy Binding Protein (PvDBP) and the erythrocyte Duffy Antigen Receptor for Chemokines (DARC), and strain-specific immune responses to PvDBP make it a complex vaccine target. To broaden the repertoire of potential P. vivax merozoite-stage vaccine targets, we exploited a recent breakthrough in expressing full-length ectodomains of Plasmodium proteins in a functionally-active form in mammalian cells and initiated a large-scale study of P. vivax merozoite proteins that are potentially involved in reticulocyte binding and invasion. Methodology/Principal Findings We selected 39 P. vivax proteins that are predicted to localize to the merozoite surface or invasive secretory organelles, some of which show homology to P. falciparum vaccine candidates. Of these, we were able to express 37 full-length protein ectodomains in a mammalian expression system, which has been previously used to express P. falciparum invasion ligands such as PfRH5. To establish whether the expressed proteins were correctly folded, we assessed whether they were recognized by antibodies from Cambodian patients with acute vivax malaria. IgG from these samples showed at least a two-fold change in reactivity over naïve controls in 27 of 34 antigens tested, and the majority showed heat-labile IgG immunoreactivity, suggesting the presence of conformation-sensitive epitopes and native tertiary protein structures. Using a method specifically designed to detect low-affinity, extracellular protein-protein interactions, we confirmed a predicted interaction between P. vivax 6-cysteine proteins P12 and P41, further suggesting that the proteins are natively folded and functional. This screen also identified two novel protein-protein interactions, between P12 and PVX_110945, and between MSP3.10 and MSP7.1, the latter of which was confirmed by surface plasmon resonance. Conclusions/Significance We produced a new library of recombinant full-length P. vivax ectodomains, established that the majority of them contain tertiary structure, and used them to identify predicted and novel protein-protein interactions. As well as identifying new interactions for further biological studies, this library will be useful in identifying P. vivax proteins with vaccine potential, and studying P. vivax malaria pathogenesis and immunity. Trial Registration ClinicalTrials.gov NCT00663546 PMID:26701602

  18. Intercalating dyes for enhanced contrast in second-harmonic generation imaging of protein crystals

    PubMed Central

    Newman, Justin A.; Scarborough, Nicole M.; Pogranichniy, Nicholas R.; Shrestha, Rashmi K.; Closser, Richard G.; Das, Chittaranjan; Simpson, Garth J.

    2015-01-01

    The second-harmonic generation (SHG) activity of protein crystals was found to be enhanced by up to ∼1000-fold by the intercalation of SHG phores within the crystal lattice. Unlike the intercalation of fluorophores, the SHG phores produced no significant background SHG from solvated dye or from dye intercalated into amorphous aggregates. The polarization-dependent SHG is consistent with the chromophores adopting the symmetry of the crystal lattice. In addition, the degree of enhancement for different symmetries of dyes is consistent with theoretical predictions based on the molecular nonlinear optical response. Kinetics studies indicate that intercalation arises over a timeframe of several minutes in lysozyme, with detectable enhancements within seconds. These results provide a potential means to increase the overall diversity of protein crystals and crystal sizes amenable to characterization by SHG microscopy. PMID:26143918

  19. Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach

    PubMed Central

    NagaSundaram, N; Priya Doss, C George

    2011-01-01

    Background: Distinguishing the deleterious from the massive number of non-functional nsSNPs that occur within a single genome is a considerable challenge in mutation research. In this approach, we have used the existing in silico methods to explore the mutation-structure-function relationship in the XPAgene. Materials and Methods: We used the Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), I-Mutant 2.0, and the Protein Analysis THrough Evolutionary Relationships methods to predict the effects of deleterious nsSNPs on protein function and evaluated the impact of mutation on protein stability by Molecular Dynamics simulations. Results: By comparing the scores of all the four in silico methods, nsSNP with an ID rs104894131 at position C108F was predicted to be highly deleterious. We extended our Molecular dynamics approach to gain insight into the impact of this non-synonymous polymorphism on structural changes that may affect the activity of the XPAgene. Conclusion: Based on the in silico methods score, potential energy, root-mean-square deviation, and root-mean-square fluctuation, we predict that deleterious nsSNP at position C108F would play a significant role in causing disease by the XPA gene. Our approach would present the application of in silicotools in understanding the functional variation from the perspective of structure, evolution, and phenotype. PMID:22190868

  20. A Recessive Mutation Resulting in a Disabling Amino Acid Substitution (T194R) in the LHX3 Homeodomain Causes Combined Pituitary Hormone Deficiency

    PubMed Central

    Bechtold-Dalla Pozza, Susanne; Hiedl, Stefan; Roeb, Julia; Lohse, Peter; Malik, Raleigh E.; Park, Soyoung; Durán-Prado, Mario; Rhodes, Simon J.

    2012-01-01

    Background/Aims Recessive mutations in the LHX3 ho-meodomain transcription factor gene are associated with developmental disorders affecting the pituitary and nervous system. We describe pediatric patients with combined pituitary hormone deficiency (CPHD) who harbor a novel mutation in LHX3. Methods Two female siblings from related parents were examined. Both patients had neonatal complications. The index patient had CPHD featuring deficiencies of GH, LH, FSH, PRL, and TSH, with later onset of ACTH deficiency. She also had a hypoplastic anterior pituitary, respiratory distress, hearing impairment, and limited neck rotation. The LHX3 gene was sequenced and the biochemical properties of the predicted altered proteins were characterized. Results A novel homozygous mutation predicted to change amino acid 194 from threonine to arginine (T194R) was detected in both patients. This amino acid is conserved in the DNA-binding homeodomain. Computer modeling predicted that the T194R change would alter the homeodomain structure. The T194R protein did not bind tested LHX3 DNA recognition sites and did not activate the α-glycoprotein and PRL target genes. Conclusion The T194R mutation affects a critical residue in the LHX3 protein. This study extends our understanding of the phenotypic features, molecular mechanism, and developmental course associated with mutations in the LHX3 gene. PMID:22286346

  1. Classification of protein quaternary structure by functional domain composition

    PubMed Central

    Yu, Xiaojing; Wang, Chuan; Li, Yixue

    2006-01-01

    Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572

  2. Mining protein function from text using term-based support vector machines

    PubMed Central

    Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

    2005-01-01

    Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835

  3. Dynamics of Nanoparticle-Protein Corona Complex Formation: Analytical Results from Population Balance Equations

    PubMed Central

    Darabi Sahneh, Faryad; Scoglio, Caterina; Riviere, Jim

    2013-01-01

    Background Nanoparticle-protein corona complex formation involves absorption of protein molecules onto nanoparticle surfaces in a physiological environment. Understanding the corona formation process is crucial in predicting nanoparticle behavior in biological systems, including applications of nanotoxicology and development of nano drug delivery platforms. Method This paper extends the modeling work in to derive a mathematical model describing the dynamics of nanoparticle corona complex formation from population balance equations. We apply nonlinear dynamics techniques to derive analytical results for the composition of nanoparticle-protein corona complex, and validate our results through numerical simulations. Results The model presented in this paper exhibits two phases of corona complex dynamics. In the first phase, proteins rapidly bind to the free surface of nanoparticles, leading to a metastable composition. During the second phase, continuous association and dissociation of protein molecules with nanoparticles slowly changes the composition of the corona complex. Given sufficient time, composition of the corona complex reaches an equilibrium state of stable composition. We find analytical approximate formulae for metastable and stable compositions of corona complex. Our formulae are very well-structured to clearly identify important parameters determining corona composition. Conclusion The dynamics of biocorona formation constitute vital aspect of interactions between nanoparticles and living organisms. Our results further understanding of these dynamics through quantitation of experimental conditions, modeling results for in vitro systems to better predict behavior for in vivo systems. One potential application would involve a single cell culture medium related to a complex protein medium, such as blood or tissue fluid. PMID:23741371

  4. Serum transferrin as a prognostic indicator of spontaneous closure and mortality in gastrointestinal cutaneous fistulas.

    PubMed Central

    Kuvshinoff, B W; Brodish, R J; McFadden, D W; Fischer, J E

    1993-01-01

    OBJECTIVE: This study determined whether there are any laboratory or other features that will enable prediction of spontaneous closure in patients with gastrointestinal cutaneous fistulas. SUMMARY BACKGROUND DATA: Although the anatomic criteria for spontaneous closure of gastrointestinal cutaneous fistulas have been presented by several authors, less than 50% of such fistulas tend to close, even in the most recent series. METHODS: A group of patients with gastrointestinal cutaneous fistulas with anatomical features favorable to study were investigated with respect to a series of parameters including the usual demographic parameters, plus fistula output, number of blood transfusions, presence of sepsis, as well as metabolic parameters including serum transferrin, retinol-binding protein, thyroxin-binding prealbumin, and serum albumin. RESULTS: Of 79 patients with 116 fistulas, 16 (20.3%) died. Causes of death were uncontrolled sepsis in eight patients and cancer in five patients. Postoperative fistulas constituted 80% of the group. The presence of local sepsis, systemic sepsis, remote sepsis (such as pneumonia or line sepsis), the number of fistulas, fistula output, and the number of blood transfusions were not predictive of spontaneous closure, whereas serum transferrin was predictive of spontaneous closure. Serum transferrin, retinol-binding protein, and thyroxin-binding prealbumin were predictive of mortality. CONCLUSIONS: Serum transferrin does not appear to be an entirely independent variable, but seems to identify those patients with significant remote sepsis, systemic sepsis, and neoplasia in whom these processes are clinically significant. The results, if confirmed, and provided that nutritional needs are met, suggest that short-turnover proteins, particularly serum transferrin, might be useful in predicting which patients with gastrointestinal cutaneous fistulas should undergo surgery despite anatomic criteria favorable for spontaneous closure. PMID:8507110

  5. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information

    PubMed Central

    2013-01-01

    Background The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. Results In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. Conclusions This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/). PMID:23387468

  6. Pre-treatment plasma proteomic markers associated with survival in oesophageal cancer

    PubMed Central

    Kelly, P; Paulin, F; Lamont, D; Baker, L; Clearly, S; Exon, D; Thompson, A

    2012-01-01

    Background: The incidence of oesophageal adenocarcinoma is increasing worldwide but survival remains poor. Neoadjuvant chemotherapy can improve survival, but prognostic and predictive biomarkers are required. This study built upon preclinical approaches to identify prognostic plasma proteomic markers in oesophageal cancer. Methods: Plasma samples collected before and during the treatment of oesophageal cancer and non-cancer controls were analysed by surface-enhanced laser desorption/ionisation time-of-flight (SELDI-TOF) mass spectroscopy (MS). Protein peaks were identified by MS in tryptic digests of purified fractions. Associations between peak intensities obtained in the spectra and clinical endpoints (survival, disease-free survival) were tested by univariate (Fisher's exact test) and multivariate analysis (binary logistic regression). Results: Plasma protein peaks were identified that differed significantly (P<0.05, ANOVA) between the oesophageal cancer and control groups at baseline. Three peaks, confirmed as apolipoprotein A-I, serum amyloid A and transthyretin, in baseline (pre-treatment) samples were associated by univariate and multivariate analysis with disease-free survival and overall survival. Conclusion: Plasma proteins can be detected prior to treatment for oesophageal cancer that are associated with outcome and merit testing as prognostic and predictive markers of response to guide chemotherapy in oesophageal cancer. PMID:22294182

  7. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072

  8. Eukaryotic Protein Kinases (ePKs) of the Helminth Parasite Schistosoma mansoni

    PubMed Central

    2011-01-01

    Background Schistosomiasis remains an important parasitic disease and a major economic problem in many countries. The Schistosoma mansoni genome and predicted proteome sequences were recently published providing the opportunity to identify new drug candidates. Eukaryotic protein kinases (ePKs) play a central role in mediating signal transduction through complex networks and are considered druggable targets from the medical and chemical viewpoints. Our work aimed at analyzing the S. mansoni predicted proteome in order to identify and classify all ePKs of this parasite through combined computational approaches. Functional annotation was performed mainly to yield insights into the parasite signaling processes relevant to its complex lifestyle and to select some ePKs as potential drug targets. Results We have identified 252 ePKs, which corresponds to 1.9% of the S. mansoni predicted proteome, through sequence similarity searches using HMMs (Hidden Markov Models). Amino acid sequences corresponding to the conserved catalytic domain of ePKs were aligned by MAFFT and further used in distance-based phylogenetic analysis as implemented in PHYLIP. Our analysis also included the ePK homologs from six other eukaryotes. The results show that S. mansoni has proteins in all ePK groups. Most of them are clearly clustered with known ePKs in other eukaryotes according to the phylogenetic analysis. None of the ePKs are exclusively found in S. mansoni or belong to an expanded family in this parasite. Only 16 S. mansoni ePKs were experimentally studied, 12 proteins are predicted to be catalytically inactive and approximately 2% of the parasite ePKs remain unclassified. Some proteins were mentioned as good target for drug development since they have a predicted essential function for the parasite. Conclusions Our approach has improved the functional annotation of 40% of S. mansoni ePKs through combined similarity and phylogenetic-based approaches. As we continue this work, we will highlight the biochemical and physiological adaptations of S. mansoni in response to diverse environments during the parasite development, vector interaction, and host infection. PMID:21548963

  9. In silico prediction of protein-protein interactions in human macrophages

    PubMed Central

    2014-01-01

    Background Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level. PMID:24636261

  10. Proposal for a recovery prediction method for patients affected by acute mediastinitis

    PubMed Central

    2012-01-01

    Background An attempt to find a prediction method of death risk in patients affected by acute mediastinitis. There is not such a tool described in available literature for that serious disease. Methods The study comprised 44 consecutive cases of acute mediastinitis. General anamnesis and biochemical data were included. Factor analysis was used to extract the risk characteristic for the patients. The most valuable results were obtained for 8 parameters which were selected for further statistical analysis (all collected during few hours after admission). Three factors reached Eigenvalue >1. Clinical explanations of these combined statistical factors are: Factor1 - proteinic status (serum total protein, albumin, and hemoglobin level), Factor2 - inflammatory status (white blood cells, CRP, procalcitonin), and Factor3 - general risk (age, number of coexisting diseases). Threshold values of prediction factors were estimated by means of statistical analysis (factor analysis, Statgraphics Centurion XVI). Results The final prediction result for the patients is constructed as simultaneous evaluation of all factor scores. High probability of death should be predicted if factor 1 value decreases with simultaneous increase of factors 2 and 3. The diagnostic power of the proposed method was revealed to be high [sensitivity =90%, specificity =64%], for Factor1 [SNC = 87%, SPC = 79%]; for Factor2 [SNC = 87%, SPC = 50%] and for Factor3 [SNC = 73%, SPC = 71%]. Conclusion The proposed prediction method seems a useful emergency signal during acute mediastinitis control in affected patients. PMID:22574625

  11. Accounting for epistatic interactions improves the functional analysis of protein structures.

    PubMed

    Wilkins, Angela D; Venner, Eric; Marciano, David C; Erdin, Serkan; Atri, Benu; Lua, Rhonald C; Lichtarge, Olivier

    2013-11-01

    The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. lichtarge@bcm.edu. Supplementary data are available at Bioinformatics online.

  12. Accounting for epistatic interactions improves the functional analysis of protein structures

    PubMed Central

    Wilkins, Angela D.; Venner, Eric; Marciano, David C.; Erdin, Serkan; Atri, Benu; Lua, Rhonald C.; Lichtarge, Olivier

    2013-01-01

    Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact: lichtarge@bcm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24021383

  13. An improved method to detect correct protein folds using partial clustering

    PubMed Central

    2013-01-01

    Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835

  14. Miscellaneous Topics in Computer-Aided Drug Design: Synthetic Accessibility and GPU Computing, and Other Topics

    PubMed Central

    Fukunishi, Yoshifumi; Mashimo, Tadaaki; Misoo, Kiyotaka; Wakabayashi, Yoshinori; Miyaki, Toshiaki; Ohta, Seiji; Nakamura, Mayu; Ikeda, Kazuyoshi

    2016-01-01

    Abstract: Background Computer-aided drug design is still a state-of-the-art process in medicinal chemistry, and the main topics in this field have been extensively studied and well reviewed. These topics include compound databases, ligand-binding pocket prediction, protein-compound docking, virtual screening, target/off-target prediction, physical property prediction, molecular simulation and pharmacokinetics/pharmacodynamics (PK/PD) prediction. Message and Conclusion: However, there are also a number of secondary or miscellaneous topics that have been less well covered. For example, methods for synthesizing and predicting the synthetic accessibility (SA) of designed compounds are important in practical drug development, and hardware/software resources for performing the computations in computer-aided drug design are crucial. Cloud computing and general purpose graphics processing unit (GPGPU) computing have been used in virtual screening and molecular dynamics simulations. Not surprisingly, there is a growing demand for computer systems which combine these resources. In the present review, we summarize and discuss these various topics of drug design. PMID:27075578

  15. Protein docking prediction using predicted protein-protein interface.

    PubMed

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  16. Inferring protein domains associated with drug side effects based on drug-target interaction network

    PubMed Central

    2013-01-01

    Background Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. Results In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. Conclusion The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains. PMID:24565527

  17. Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly

    PubMed Central

    Su, Yun-Lin; Li, Jun-Min; Li, Meng; Luan, Jun-Bo; Ye, Xiao-Dong; Wang, Xiao-Wei; Liu, Shu-Sheng

    2012-01-01

    Background Some species of the whitefly Bemisia tabaci complex cause tremendous losses to crops worldwide through feeding directly and virus transmission indirectly. The primary salivary glands of whiteflies are critical for their feeding and virus transmission. However, partly due to their tiny size, research on whitefly salivary glands is limited and our knowledge on these glands is scarce. Methodology/Principal Findings We sequenced the transcriptome of the primary salivary glands of the Mediterranean species of B. tabaci complex using an effective cDNA amplification method in combination with short read sequencing (Illumina). In a single run, we obtained 13,615 unigenes. The quantity of the unigenes obtained from the salivary glands of the whitefly is at least four folds of the salivary gland genes from other plant-sucking insects. To reveal the functions of the primary glands, sequence similarity search and comparisons with the whole transcriptome of the whitefly were performed. The results demonstrated that the genes related to metabolism and transport were significantly enriched in the primary salivary glands. Furthermore, we found that a number of highly expressed genes in the salivary glands might be involved in secretory protein processing, secretion and virus transmission. To identify potential proteins of whitefly saliva, the translated unigenes were put into secretory protein prediction. Finally, 295 genes were predicted to encode secretory proteins and some of them might play important roles in whitefly feeding. Conclusions/Significance: The combined method of cDNA amplification, Illumina sequencing and de novo assembly is suitable for transcriptomic analysis of tiny organs in insects. Through analysis of the transcriptome, genomic features of the primary salivary glands were dissected and biologically important proteins, especially secreted proteins, were predicted. Our findings provide substantial sequence information for the primary salivary glands of whiteflies and will be the basis for future studies on whitefly-plant interactions and virus transmission. PMID:22745728

  18. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  19. Is Phosphoproteomics Ready for Clinical Research?

    PubMed Central

    Iliuk, Anton B.; Tao, W. Andy

    2012-01-01

    Background For many diseases such as cancer where phosphorylation-dependent signaling is the foundation of disease onset and progression, single-gene testing and genomic profiling alone are not sufficient in providing most critical information. The reason for this is that in these activated pathways the signaling changes and drug resistance are often not directly correlated with changes in protein expression levels. In order to obtain the essential information needed to evaluate pathway activation or the effects of certain drugs and therapies on the molecular level, the analysis of changes in protein phosphorylation is critical. Methods Existing approaches do not differentiate clinical disease subtypes on the protein and signaling pathway level, and therefore hamper the predictive management of the disease and the selection of therapeutic targets. Conclusions The mini-review examines the impact of emerging systems biology tools and the possibility of applying phosphoproteomics to clinical research. PMID:23159844

  20. Discovering protein complexes in protein interaction networks via exploring the weak ties effect

    PubMed Central

    2012-01-01

    Background Studying protein complexes is very important in biological processes since it helps reveal the structure-functionality relationships in biological networks and much attention has been paid to accurately predict protein complexes from the increasing amount of protein-protein interaction (PPI) data. Most of the available algorithms are based on the assumption that dense subgraphs correspond to complexes, failing to take into account the inherence organization within protein complex and the roles of edges. Thus, there is a critical need to investigate the possibility of discovering protein complexes using the topological information hidden in edges. Results To provide an investigation of the roles of edges in PPI networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connectivity, indicating the weak ties phenomenon in PPI networks. We further demonstrate that there is a negative relation between the weak tie strength and the topological similarity. By using the bridges, a reliable virtual network is constructed, in which each maximal clique corresponds to the core of a complex. By this notion, the detection of the protein complexes is transformed into a classic all-clique problem. A novel core-attachment based method is developed, which detects the cores and attachments, respectively. A comprehensive comparison among the existing algorithms and our algorithm has been made by comparing the predicted complexes against benchmark complexes. Conclusions We proved that the weak tie effect exists in the PPI network and demonstrated that the density is insufficient to characterize the topological structure of protein complexes. Furthermore, the experimental results on the yeast PPI network show that the proposed method outperforms the state-of-the-art algorithms. The analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes. PMID:23046740

  1. Triangle network motifs predict complexes by complementing high-error interactomes with structural information

    PubMed Central

    Andreopoulos, Bill; Winter, Christof; Labudde, Dirk; Schroeder, Michael

    2009-01-01

    Background A lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles. Results We find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes. Conclusion Given high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN. PMID:19558694

  2. A new picture of cell wall protein dynamics in elongating cells of Arabidopsis thaliana: Confirmed actors and newcomers

    PubMed Central

    Irshad, Muhammad; Canut, Hervé; Borderies, Gisèle; Pont-Lezica, Rafael; Jamet, Elisabeth

    2008-01-01

    Background Cell elongation in plants requires addition and re-arrangements of cell wall components. Even if some protein families have been shown to play roles in these events, a global picture of proteins present in cell walls of elongating cells is still missing. A proteomic study was performed on etiolated hypocotyls of Arabidopsis used as model of cells undergoing elongation followed by growth arrest within a short time. Results Two developmental stages (active growth and after growth arrest) were compared. A new strategy consisting of high performance cation exchange chromatography and mono-dimensional electrophoresis was established for separation of cell wall proteins. This work allowed identification of 137 predicted secreted proteins, among which 51 had not been identified previously. Apart from expected proteins known to be involved in cell wall extension such as xyloglucan endotransglucosylase-hydrolases, expansins, polygalacturonases, pectin methylesterases and peroxidases, new proteins were identified such as proteases, proteins related to lipid metabolism and proteins of unknown function. Conclusion This work highlights the CWP dynamics that takes place between the two developmental stages. The presence of proteins known to be related to cell wall extension after growth arrest showed that these proteins may play other roles in cell walls. Finally, putative regulatory mechanisms of protein biological activity are discussed from this global view of cell wall proteins. PMID:18796151

  3. Background-Modeling-Based Adaptive Prediction for Surveillance Video Coding.

    PubMed

    Zhang, Xianguo; Huang, Tiejun; Tian, Yonghong; Gao, Wen

    2014-02-01

    The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.

  4. A domain-centric solution to functional genomics via dcGO Predictor

    PubMed Central

    2013-01-01

    Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627

  5. The prediction of late-onset preeclampsia: Results from a longitudinal proteomics study

    PubMed Central

    Erez, Offer; Romero, Roberto; Maymon, Eli; Chaemsaithong, Piya; Done, Bogdan; Pacora, Percy; Panaitescu, Bogdan; Chaiworapongsa, Tinnakorn; Hassan, Sonia S.

    2017-01-01

    Background Late-onset preeclampsia is the most prevalent phenotype of this syndrome; nevertheless, only a few biomarkers for its early diagnosis have been reported. We sought to correct this deficiency using a high through-put proteomic platform. Methods A case-control longitudinal study was conducted, including 90 patients with normal pregnancies and 76 patients with late-onset preeclampsia (diagnosed at ≥34 weeks of gestation). Maternal plasma samples were collected throughout gestation (normal pregnancy: 2–6 samples per patient, median of 2; late-onset preeclampsia: 2–6, median of 5). The abundance of 1,125 proteins was measured using an aptamers-based proteomics technique. Protein abundance in normal pregnancies was modeled using linear mixed-effects models to estimate mean abundance as a function of gestational age. Data was then expressed as multiples of-the-mean (MoM) values in normal pregnancies. Multi-marker prediction models were built using data from one of five gestational age intervals (8–16, 16.1–22, 22.1–28, 28.1–32, 32.1–36 weeks of gestation). The predictive performance of the best combination of proteins was compared to placental growth factor (PIGF) using bootstrap. Results 1) At 8–16 weeks of gestation, the best prediction model included only one protein, matrix metalloproteinase 7 (MMP-7), that had a sensitivity of 69% at a false positive rate (FPR) of 20% (AUC = 0.76); 2) at 16.1–22 weeks of gestation, MMP-7 was the single best predictor of late-onset preeclampsia with a sensitivity of 70% at a FPR of 20% (AUC = 0.82); 3) after 22 weeks of gestation, PlGF was the best predictor of late-onset preeclampsia, identifying 1/3 to 1/2 of the patients destined to develop this syndrome (FPR = 20%); 4) 36 proteins were associated with late-onset preeclampsia in at least one interval of gestation (after adjustment for covariates); 5) several biological processes, such as positive regulation of vascular endothelial growth factor receptor signaling pathway, were perturbed; and 6) from 22.1 weeks of gestation onward, the set of proteins most predictive of severe preeclampsia was different from the set most predictive of the mild form of this syndrome. Conclusions Elevated MMP-7 early in gestation (8–22 weeks) and low PlGF later in gestation (after 22 weeks) are the strongest predictors for the subsequent development of late-onset preeclampsia, suggesting that the optimal identification of patients at risk may involve a two-step diagnostic process. PMID:28738067

  6. Efficient use of unlabeled data for protein sequence classification: a comparative study

    PubMed Central

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-01-01

    Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450

  7. Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H

    2016-12-01

    Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as "feedback" to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.

  8. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

    PubMed Central

    2014-01-01

    Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894

  9. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives.

    PubMed

    Meinhardt, Sarah; Swint-Kruse, Liskin

    2008-12-01

    In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.

  10. The protein-protein interaction network of eyestalk, Y-organ and hepatopancreas in Chinese mitten crab Eriocheir sinensis.

    PubMed

    Hao, Tong; Zeng, Zheng; Wang, Bin; Zhang, Yichen; Liu, Yichen; Geng, Xuyun; Sun, Jinsheng

    2014-03-27

    The protein-protein interaction network (PIN) is an effective information tool for understanding the complex biological processes inside the cell and solving many biological problems such as signaling pathway identification and prediction of protein functions. Eriocheir sinensis is a highly-commercial aquaculture species with an unclear proteome background which hinders the construction and development of PIN for E. sinensis. However, in recent years, the development of next-generation deep-sequencing techniques makes it possible to get high throughput data of E. sinensis tanscriptome and subsequently obtain a systematic overview of the protein-protein interaction system. In this work we sequenced the transcriptional RNA of eyestalk, Y-organ and hepatopancreas in E. sinensis and generated a PIN of E. sinensis which included 3,223 proteins and 35,787 interactions. Each protein-protein interaction in the network was scored according to the homology and genetic relationship. The signaling sub-network, representing the signal transduction pathways in E. sinensis, was extracted from the global network, which depicted a global view of the signaling systems in E. sinensis. Seven basic signal transduction pathways were identified in E. sinensis. By investigating the evolution paths of the seven pathways, we found that these pathways got mature in different evolutionary stages. Moreover, the functions of unclassified proteins and unigenes in the PIN of E. sinensis were predicted. Specifically, the functions of 549 unclassified proteins related to 864 unclassified unigenes were assigned, which respectively covered 76% and 73% of all the unclassified proteins and unigenes in the network. The PIN generated in this work is the first large-scale PIN of aquatic crustacean, thereby providing a paradigmatic blueprint of the aquatic crustacean interactome. Signaling sub-network extracted from the global PIN depicts the interaction of different signaling proteins and the evolutionary paths of the identified signal transduction pathways. Furthermore, the function assignment of unclassified proteins based on the PIN offers a new reference in protein function exploration. More importantly, the construction of the E. sinensis PIN provides necessary experience for the exploration of PINs in other aquatic crustacean species.

  11. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

    PubMed Central

    2013-01-01

    Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. Conclusion The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes. PMID:24266945

  12. Predicting a small molecule-kinase interaction map: A machine learning approach

    PubMed Central

    2011-01-01

    Background We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features. Results A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided. Conclusions In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful. PMID:21708012

  13. Comparative proteomics analysis of oral cancer cell lines: identification of cancer associated proteins

    PubMed Central

    2014-01-01

    Background A limiting factor in performing proteomics analysis on cancerous cells is the difficulty in obtaining sufficient amounts of starting material. Cell lines can be used as a simplified model system for studying changes that accompany tumorigenesis. This study used two-dimensional gel electrophoresis (2DE) to compare the whole cell proteome of oral cancer cell lines vs normal cells in an attempt to identify cancer associated proteins. Results Three primary cell cultures of normal cells with a limited lifespan without hTERT immortalization have been successfully established. 2DE was used to compare the whole cell proteome of these cells with that of three oral cancer cell lines. Twenty four protein spots were found to have changed in abundance. MALDI TOF/TOF was then used to determine the identity of these proteins. Identified proteins were classified into seven functional categories – structural proteins, enzymes, regulatory proteins, chaperones and others. IPA core analysis predicted that 18 proteins were related to cancer with involvements in hyperplasia, metastasis, invasion, growth and tumorigenesis. The mRNA expressions of two proteins – 14-3-3 protein sigma and Stress-induced-phosphoprotein 1 – were found to correlate with the corresponding proteins’ abundance. Conclusions The outcome of this analysis demonstrated that a comparative study of whole cell proteome of cancer versus normal cell lines can be used to identify cancer associated proteins. PMID:24422745

  14. Molecular dynamics simulation studies and in vitro site directed mutagenesis of avian beta-defensin Apl_AvBD2

    PubMed Central

    2010-01-01

    Background Defensins comprise a group of antimicrobial peptides, widely recognized as important elements of the innate immune system in both animals and plants. Cationicity, rather than the secondary structure, is believed to be the major factor defining the antimicrobial activity of defensins. To test this hypothesis and to improve the activity of the newly identified avian β-defensin Apl_AvBD2 by enhancing the cationicity, we performed in silico site directed mutagenesis, keeping the predicted secondary structure intact. Molecular dynamics (MD) simulation studies were done to predict the activity. Mutant proteins were made by in vitro site directed mutagenesis and recombinant protein expression, and tested for antimicrobial activity to confirm the results obtained in MD simulation analysis. Results MD simulation revealed subtle, but critical, structural variations between the wild type Apl_AvBD2 and the more cationic in silico mutants, which were not detected in the initial structural prediction by homology modelling. The C-terminal cationic 'claw' region, important in antimicrobial activity, which was intact in the wild type, showed changes in shape and orientation in all the mutant peptides. Mutant peptides also showed increased solvent accessible surface area and more number of hydrogen bonds with the surrounding water molecules. In functional studies, the Escherichia coli expressed, purified recombinant mutant proteins showed total loss of antimicrobial activity compared to the wild type protein. Conclusion The study revealed that cationicity alone is not the determining factor in the microbicidal activity of antimicrobial peptides. Factors affecting the molecular dynamics such as hydrophobicity, electrostatic interactions and the potential for oligomerization may also play fundamental roles. It points to the usefulness of MD simulation studies in successful engineering of antimicrobial peptides for improved activity and other desirable functions. PMID:20122244

  15. Whole genome duplication events in plant evolution reconstructed and predicted using myosin motor proteins

    PubMed Central

    2013-01-01

    Background The evolution of land plants is characterized by whole genome duplications (WGD), which drove species diversification and evolutionary novelties. Detecting these events is especially difficult if they date back to the origin of the plant kingdom. Established methods for reconstructing WGDs include intra- and inter-genome comparisons, KS age distribution analyses, and phylogenetic tree constructions. Results By analysing 67 completely sequenced plant genomes 775 myosins were identified and manually assembled. Phylogenetic trees of the myosin motor domains revealed orthologous and paralogous relationships and were consistent with recent species trees. Based on the myosin inventories and the phylogenetic trees, we have identified duplications of the entire myosin motor protein family at timings consistent with 23 WGDs, that had been reported before. We also predict 6 WGDs based on further protein family duplications. Notably, the myosin data support the two recently reported WGDs in the common ancestor of all extant angiosperms. We predict single WGDs in the Manihot esculenta and Nicotiana benthamiana lineages, two WGDs for Linum usitatissimum and Phoenix dactylifera, and a triplication or two WGDs for Gossypium raimondii. Our data show another myosin duplication in the ancestor of the angiosperms that could be either the result of a single gene duplication or a remnant of a WGD. Conclusions We have shown that the myosin inventories in angiosperms retain evidence of numerous WGDs that happened throughout plant evolution. In contrast to other protein families, many myosins are still present in extant species. They are closely related and have similar domain architectures, and their phylogenetic grouping follows the genome duplications. Because of its broad taxonomic sampling the dataset provides the basis for reliable future identification of further whole genome duplications. PMID:24053117

  16. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i)

    PubMed Central

    2014-01-01

    Background Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. Results In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. Conclusions We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease. PMID:24742296

  17. SECRET domain of variola virus CrmB protein can be a member of poxviral type II chemokine-binding proteins family

    PubMed Central

    2010-01-01

    Background Variola virus (VARV) the causative agent of smallpox, eradicated in 1980, have wide spectrum of immunomodulatory proteins to evade host immunity. Recently additional biological activity was discovered for VARV CrmB protein, known to bind and inhibit tumour necrosis factor (TNF) through its N-terminal domain homologous to cellular TNF receptors. Besides binding TNF, this protein was also shown to bind with high affinity several chemokines which recruit B- and T-lymphocytes and dendritic cells to sites of viral entry and replication. Ability to bind chemokines was shown to be associated with unique C-terminal domain of CrmB protein. This domain named SECRET (Smallpox virus-Encoded Chemokine Receptor) is unrelated to the host proteins and lacks significant homology with other known viral chemokine-binding proteins or any other known protein. Findings De novo modelling of VARV-CrmB SECRET domain spatial structure revealed its apparent structural homology with cowpox virus CC-chemokine binding protein (vCCI) and vaccinia virus A41 protein, despite low sequence identity between these three proteins. Potential ligand-binding surface of modelled VARV-CrmB SECRET domain was also predicted to bear prominent electronegative charge which is characteristic to known orthopoxviral chemokine-binding proteins. Conclusions Our results suggest that SECRET should be included into the family of poxviral type II chemokine-binding proteins and that it might have been evolved from the vCCI-like predecessor protein. PMID:20979600

  18. IFACEwat: the interfacial water-implemented re-ranking algorithm to improve the discrimination of near native structures for protein rigid docking

    PubMed Central

    2014-01-01

    Background Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. In this paper, we propose a new re-ranking technique using a new energy-based scoring function, namely IFACEwat - a combined Interface Atomic Contact Energy (IFACE) and water effect. The IFACEwat aims to further improve the discrimination of the near-native structures of the initial rigid docking algorithm ZDOCK3.0.2. Unlike other re-ranking techniques, the IFACEwat explicitly implements interfacial water into the protein interfaces to account for the water-mediated contacts during the protein interactions. Results Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. Conclusions With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms. PMID:25521441

  19. Computationally predicted IgE epitopes of walnut allergens contribute to cross-reactivity with peanuts

    PubMed Central

    Maleki, Soheila J.; Teuber, Suzanne S.; Cheng, Hsiaopo; Chen, Deliang; Comstock, Sarah S.; Ruan, Sanbao; Schein, Catherine H.

    2011-01-01

    Background Cross reactivity between peanuts and tree nuts implies that similar IgE epitopes are present in their proteins. Objective To determine whether walnut sequences similar to known peanut IgE binding sequences, according to the property distance (PD) scale implemented in the Structural Database of Allergenic Proteins (SDAP), react with IgE from sera of patients with allergy to walnut and/or peanut. Methods Patient sera were characterized by Western blotting for IgE-binding to nut protein extracts, and to peptides from walnut and peanut allergens, similar to known peanut epitopes as defined by low PD values, synthesized on membranes. Competitive ELISA was used to show that peanut and predicted walnut epitope sequences compete with purified Ara h 2 for binding to IgE in serum from a cross-reactive patient. Results Sequences from the vicilin walnut allergen Jug r 2 which had low PD values to epitopes of the peanut allergen Ara h 2, a 2s-albumin, bound IgE in sera from five patients who reacted to either walnut, peanut or both. A walnut epitope recognized by 6 patients mapped to a surface-exposed region on a model of the N-terminal pro-region of Jug r 2. A predicted walnut epitope competed for IgE binding to Ara h 2 in serum as well as the known IgE epitope from Ara h 2. Conclusions Sequences with low PD value (<8.5) to known IgE epitopes could contribute to cross-reactivity between allergens. This further validates the PD scoring method for predicting cross-reactive epitopes in allergens. PMID:21883278

  20. Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

    PubMed Central

    Boerner, Susan; McGinnis, Karen M.

    2012-01-01

    Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204

  1. Characterizing and modeling protein-surface interactions in lab-on-chip devices

    NASA Astrophysics Data System (ADS)

    Katira, Parag

    Protein adsorption on surfaces determines the response of other biological species present in the surrounding solution. This phenomenon plays a major role in the design of biomedical and biotechnological devices. While specific protein adsorption is essential for device function, non-specific protein adsorption leads to the loss of device function. For example, non-specific protein adsorption on bioimplants triggers foreign body response, in biosensors it leads to reduced signal to noise ratios, and in hybrid bionanodevices it results in the loss of confinement and directionality of molecular shuttles. Novel surface coatings are being developed to reduce or completely prevent the non-specific adsorption of proteins to surfaces. A novel quantification technique for extremely low protein coverage on surfaces has been developed. This technique utilizes measurement of the landing rate of microtubule filaments on kinesin proteins adsorbed on a surface to determine the kinesin density. Ultra-low limits of detection, dynamic range, ease of detection and availability of a ready-made kinesin-microtubule kit makes this technique highly suitable for detecting protein adsorption below the detection limits of standard techniques. Secondly, a random sequential adsorption model is presented for protein adsorption to PEO-coated surfaces. The derived analytical expressions accurately predict the observed experimental results from various research groups, suggesting that PEO chains act as almost perfect steric barriers to protein adsorption. These expressions can be used to predict the performance of a variety of systems towards resisting protein adsorption and can help in the design of better non-fouling surface coatings. Finally, in biosensing systems, target analytes are captured and concentrated on specifically adsorbed proteins for detection. Non-specific adsorption of proteins results in the loss of signal, and an increase in the background. The use of nanoscale transducers as receptors is beneficial from the point of view of signal enhancement, but has a strong mass transport limitation. To overcome this, the use of molecular shuttles has been proposed that can selectively capture analytes and actively transport them to the nanoreceptors. The effect of employing such a two-stage capture process on biosensor sensitivity is theoretically investigated and an optimum design for a kinesin-microtubule-driven hybrid biosensor is proposed.

  2. The Salivary Secretome of the Tsetse Fly Glossina pallidipes (Diptera: Glossinidae) Infected by Salivary Gland Hypertrophy Virus

    PubMed Central

    Kariithi, Henry M.; Ince, Ikbal A.; Boeren, Sjef; Abd-Alla, Adly M. M.; Parker, Andrew G.; Aksoy, Serap; Vlak, Just M.; van Oers, Monique M.

    2011-01-01

    Background The competence of the tsetse fly Glossina pallidipes (Diptera; Glossinidae) to acquire salivary gland hypertrophy virus (SGHV), to support virus replication and successfully transmit the virus depends on complex interactions between Glossina and SGHV macromolecules. Critical requisites to SGHV transmission are its replication and secretion of mature virions into the fly's salivary gland (SG) lumen. However, secretion of host proteins is of equal importance for successful transmission and requires cataloging of G. pallidipes secretome proteins from hypertrophied and non-hypertrophied SGs. Methodology/Principal Findings After electrophoretic profiling and in-gel trypsin digestion, saliva proteins were analyzed by nano-LC-MS/MS. MaxQuant/Andromeda search of the MS data against the non-redundant (nr) GenBank database and a G. morsitans morsitans SG EST database, yielded a total of 521 hits, 31 of which were SGHV-encoded. On a false discovery rate limit of 1% and detection threshold of least 2 unique peptides per protein, the analysis resulted in 292 Glossina and 25 SGHV MS-supported proteins. When annotated by the Blast2GO suite, at least one gene ontology (GO) term could be assigned to 89.9% (285/317) of the detected proteins. Five (∼1.8%) Glossina and three (∼12%) SGHV proteins remained without a predicted function after blast searches against the nr database. Sixty-five of the 292 detected Glossina proteins contained an N-terminal signal/secretion peptide sequence. Eight of the SGHV proteins were predicted to be non-structural (NS), and fourteen are known structural (VP) proteins. Conclusions/Significance SGHV alters the protein expression pattern in Glossina. The G. pallidipes SG secretome encompasses a spectrum of proteins that may be required during the SGHV infection cycle. These detected proteins have putative interactions with at least 21 of the 25 SGHV-encoded proteins. Our findings opens venues for developing novel SGHV mitigation strategies to block SGHV infections in tsetse production facilities such as using SGHV-specific antibodies and phage display-selected gut epithelia-binding peptides. PMID:22132244

  3. The predicted secretome and transmembranome of the poultry red mite Dermanyssus gallinae

    PubMed Central

    2013-01-01

    Background The worldwide distributed hematophagous poultry red mite Dermanyssus gallinae (De Geer, 1778) is one of the most important pests of poultry. Even though 35 acaricide compounds are available, control of D. gallinae remains difficult due to acaricide resistances as well as food safety regulations. The current study was carried out to identify putative excretory/secretory (pES) proteins of D. gallinae since these proteins play an important role in the host-parasite interaction and therefore represent potential targets for the development of novel intervention strategies. Additionally, putative transmembrane proteins (pTM) of D. gallinae were analyzed as representatives of this protein group also serve as promising targets for new control strategies. Methods D. gallinae pES and pTM protein prediction was based on putative protein sequences of whole transcriptome data which was parsed to different bioinformatical servers (SignalP, SecretomeP, TMHMM and TargetP). Subsequently, pES and pTM protein sequences were functionally annotated by different computational tools. Results Computational analysis of the D. gallinae proteins identified 3,091 pES (5.6%) and 7,361 pTM proteins (13.4%). A significant proportion of pES proteins are considered to be involved in blood feeding and digestion such as salivary proteins, proteases, lipases and carbohydrases. The cysteine proteases cathepsin D and L as well as legumain, enzymes that cleave hemoglobin during blood digestion of the near related ticks, represented 6 of the top-30 BLASTP matches of the poultry red mite’s secretome. Identified pTM proteins may be involved in many important biological processes including cell signaling, transport of membrane-impermeable molecules and cell recognition. Ninjurin-like proteins, whose functions in mites are still unknown, represent the most frequently occurring pTM. Conclusion The current study is the first providing a mite’s secretome as well as transmembranome and provides valuable insights into D. gallinae pES and pTM proteins operating in different metabolic pathways. Identifying a variety of molecules putatively involved in blood feeding may significantly contribute to the development of new therapeutic targets or vaccines against this poultry pest. PMID:24020355

  4. Predictive and Experimental Approaches for Elucidating Protein–Protein Interactions and Quaternary Structures

    PubMed Central

    Nealon, John Oliver; Philomina, Limcy Seby

    2017-01-01

    The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development. PMID:29206185

  5. General overview on structure prediction of twilight-zone proteins.

    PubMed

    Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew

    2015-09-04

    Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.

  6. EGASP: the human ENCODE Genome Annotation Assessment Project

    PubMed Central

    Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G

    2006-01-01

    Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836

  7. Population-based study of blood biomarkers in prediction of sub-acute recurrent stroke

    PubMed Central

    Segal, Helen C; Burgess, Annette I; Poole, Debbie L; Mehta, Ziyah; Silver, Louise E; Rothwell, Peter M

    2017-01-01

    Background and purpose Risk of recurrent stroke is high in the first few weeks after TIA or stroke and clinic risk prediction tools have only limited accuracy, particularly after the hyper-acute phase. Previous studies of the predictive value of biomarkers have been small, been done in selected populations and have not concentrated on the acute phase or on intensively treated populations. We aimed to determine the predictive value of a panel of blood biomarkers in intensively treated patients early after TIA and stroke. Methods We studied 14 blood biomarkers related to inflammation, thrombosis, atherogenesis and cardiac or neuronal cell damage in early TIA or ischaemic stroke in a population-based study (Oxford Vascular Study). Biomarker levels were related to 90-day risk of recurrent stroke as Hazard Ratio (95%CI) per decile increase, adjusted for age and sex. Results Among 1292 eligible patients there were 53 recurrent ischaemic strokes within 90 days. There were moderate correlations (r>0.40; p<0001) between the inflammatory biomarkers and between the cell damage and thrombotic subsets. However, associations with risk of early recurrent stroke were weak, with significant associations limited to Interleukin-6 (HR=1.12, 1.01-1.24; p=0.035) and C-reactive protein (1.16, 1.02-1.30; p=0.019). When stratified by type of presenting event, P-selectin predicted stroke after TIA (1.31, 1.03-1.66; p=0.028) and C-reactive protein predicted stroke after stroke (1.16, 1.01-1.34; p=0.042). These associations remained after fully adjusting for other vascular risk factors. Conclusion In the largest study to date, we found very limited predictive utility for early recurrent stroke for a panel of inflammatory, thrombotic and cell damage biomarkers. PMID:25158774

  8. A traveling salesman approach for predicting protein functions.

    PubMed

    Johnson, Olin; Liu, Jing

    2006-10-12

    Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.

  9. PNAC: a protein nucleolar association classifier

    PubMed Central

    2011-01-01

    Background Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional. Results To describe the dynamic localisation of proteins in the nucleolus, we investigated the extent of nucleolar association of proteins by first collating an extensively curated literature-derived dataset. This dataset then served to train a probabilistic predictor which integrates gene and protein characteristics. Unlike most previous experimental and computational studies of the nucleolar proteome that produce large static lists of nucleolar proteins regardless of their extent of nucleolar association, our predictor models the fluidity of the nucleolus by considering different classes of nucleolar-associated proteins. The new method predicts all human proteins as either nucleolar-enriched, nucleolar-nucleoplasmic, nucleolar-cytoplasmic or non-nucleolar. Leave-one-out cross validation tests reveal sensitivity values for these four classes ranging from 0.72 to 0.90 and positive predictive values ranging from 0.63 to 0.94. The overall accuracy of the classifier was measured to be 0.85 on an independent literature-based test set and 0.74 using a large independent quantitative proteomics dataset. While the three nucleolar-association groups display vastly different Gene Ontology biological process signatures and evolutionary characteristics, they collectively represent the most well characterised nucleolar functions. Conclusions Our proteome-wide classification of nucleolar association provides a novel representation of the dynamic content of the nucleolus. This model of nucleolar localisation thus increases the coverage while providing accurate and specific annotations of the nucleolar proteome. It will be instrumental in better understanding the central role of the nucleolus in the cell and its interaction with other subcellular compartments. PMID:21272300

  10. The effects of increased dietary protein yogurt snack in the afternoon on appetite control and eating initiation in healthy women

    PubMed Central

    2013-01-01

    Background A large portion of daily intake comes from snacking. One of the increasingly common, healthier snacks includes Greek-style yogurt, which is typically higher in protein than regular yogurt. This study evaluated whether a 160 kcal higher-protein (HP) Greek-style yogurt snack improves appetite control, satiety, and delays subsequent eating compared to an isocaloric normal protein (NP) regular yogurt in healthy women. This study also identified the factors that predict the onset of eating. Findings Thirty-two healthy women (age: 27 ± 2y; BMI: 23.0 ± 0.4 kg/m2) completed the acute, randomized crossover-design study. On separate days, participants came to our facility to consume a standardized lunch followed by the consumption of the NP (5.0 g protein) or HP (14.0 g protein) yogurt at 3 h post-lunch. Perceived hunger and fullness were assessed throughout the afternoon until dinner was voluntarily requested; ad libitum dinner was then provided. Snacking led to reductions in hunger and increases in fullness. No differences in post-snack perceived hunger or fullness were observed between the NP and HP yogurt snacks. Dinner was voluntarily requested at approximately 2:40 ± 0:05 h post-snack with no differences between the HP vs. NP yogurts. Ad libitum dinner intake was not different between the snacks (NP: 686 ± 33 kcal vs. HP: 709 ± 34 kcal; p = 0.324). In identifying key factors that predict eating initiation, perceived hunger, fullness, and habitual dinner time accounted for 30% of the variability of time to dinner request (r = 0.55; p < 0.001). Conclusions The additional 9 g of protein contained in the high protein Greek yogurt was insufficient to elicit protein-related improvements in markers of energy intake regulation. PMID:23742659

  11. Protein expression, characterization and activity comparisons of wild type and mutant DUSP5 proteins

    DOE PAGES

    Nayak, Jaladhi; Gastonguay, Adam J.; Talipov, Marat R.; ...

    2014-12-18

    Background: The mitogen-activated protein kinases (MAPKs) pathway is critical for cellular signaling, and proteins such as phosphatases that regulate this pathway are important for normal tissue development. Based on our previous work on dual specificity phosphatase-5 (DUSP5), and its role in embryonic vascular development and disease, we hypothesized that mutations in DUSP5 will affect its function. Results: In this study, we tested this hypothesis by generating full-length glutathione-S-transferase-tagged DUSP5 and serine 147 proline mutant (S147P) proteins from bacteria. Light scattering analysis, circular dichroism, enzymatic assays and molecular modeling approaches have been performed to extensively characterize the protein form and function.more » We demonstrate that both proteins are active and, interestingly, the S147P protein is hypoactive as compared to the DUSP5 WT protein in two distinct biochemical substrate assays. Furthermore, due to the novel positioning of the S147P mutation, we utilize computational modeling to reconstruct full-length DUSP5 and S147P to predict a possible mechanism for the reduced activity of S147P. Conclusion: Taken together, this is the first evidence of the generation and characterization of an active, full-length, mutant DUSP5 protein which will facilitate future structure-function and drug development-based studies.« less

  12. Quantifying the relationship between sequence and three-dimensional structure conservation in RNA

    PubMed Central

    2010-01-01

    Background In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. Results Here we introduce an all-against-all sequence- and three-dimensional (3D) structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i) there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii) evolution tends to conserve more RNA structure than sequence, and (iii) there is a twilight zone for RNA homology detection. Discussion The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction. PMID:20550657

  13. Identification of potential serum peptide biomarkers of biliary tract cancer using MALDI MS profiling

    PubMed Central

    2014-01-01

    Background The aim of this discovery study was the identification of peptide serum biomarkers for detecting biliary tract cancer (BTC) using samples from healthy volunteers and benign cases of biliary disease as control groups. This work was based on the hypothesis that cancer-specific exopeptidases exist and that their activities in serum can generate cancer-predictive peptide fragments from circulating proteins during coagulation. Methods This case control study used a semi-automated platform incorporating polypeptide extraction linked to matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) to profile 92 patient serum samples. Predictive models were generated to test a validation serum set from BTC cases and healthy volunteers. Results Several peptide peaks were found that could significantly differentiate BTC patients from healthy controls and benign biliary disease. A predictive model resulted in a sensitivity of 100% and a specificity of 93.8% in detecting BTC in the validation set, whilst another model gave a sensitivity of 79.5% and a specificity of 83.9% in discriminating BTC from benign biliary disease samples in the training set. Discriminatory peaks were identified by tandem MS as fragments of abundant clotting proteins. Conclusions Serum MALDI MS peptide signatures can accurately discriminate patients with BTC from healthy volunteers. PMID:24495412

  14. Predictive factors for intrauterine growth restriction

    PubMed Central

    Albu, AR; Anca, AF; Horhoianu, VV; Horhoianu, IA

    2014-01-01

    Abstract Reduced fetal growth is seen in about 10% of the pregnancies but only a minority has a pathological background and is known as intrauterine growth restriction or fetal growth restriction (IUGR / FGR). Increased fetal and neonatal mortality and morbidity as well as adult pathologic conditions are often associated to IUGR. Risk factors for IUGR are easy to assess but have poor predictive value. For the diagnostic purpose, biochemical serum markers, ultrasound and Doppler study of uterine and spiral arteries, placental volume and vascularization, first trimester growth pattern are object of assessment today. Modern evaluations propose combined algorithms using these strategies, all with the goal of a better prediction of risk pregnancies. Abbreviations: SGA = small for gestational age; IUGR = intrauterine growth restriction; FGR = fetal growth restriction; IUFD = intrauterine fetal demise; HIV = human immunodeficiency virus; PAPP-A = pregnancy associated plasmatic protein A; β-hCG = beta human chorionic gonadotropin; MoM = multiple of median; ADAM-12 = A-disintegrin and metalloprotease 12; PP-13 = placental protein 13; VEGF = vascular endothelial growth factor; PlGF = placental growth factor; sFlt-1 = soluble fms-like tyrosine kinase-1; UAD = uterine arteries Doppler ultrasound; RI = resistence index; PI = pulsatility index; VOCAL = Virtual Organ Computer–Aided Analysis software; VI = vascularization index; FI = flow index; VFI = vascularization flow index; PQ = placental quotient PMID:25408721

  15. Predicting motion sickness during parabolic flight

    NASA Technical Reports Server (NTRS)

    Harm, Deborah L.; Schlegel, Todd T.

    2002-01-01

    BACKGROUND: There are large individual differences in susceptibility to motion sickness. Attempts to predict who will become motion sick have had limited success. In the present study, we examined gender differences in resting levels of salivary amylase and total protein, cardiac interbeat intervals (R-R intervals), and a sympathovagal index and evaluated their potential to correctly classify individuals into two motion sickness severity groups. METHODS: Sixteen subjects (10 men and 6 women) flew four sets of 10 parabolas aboard NASA's KC-135 aircraft. Saliva samples for amylase and total protein were collected preflight on the day of the flight and motion sickness symptoms were recorded during each parabola. Cardiovascular parameters were collected in the supine position 1-5 days before the flight. RESULTS: There were no significant gender differences in sickness severity or any of the other variables mentioned above. Discriminant analysis using salivary amylase, R-R intervals and the sympathovagal index produced a significant Wilks' lambda coefficient of 0.36, p=0.006. The analysis correctly classified 87% of the subjects into the none-mild sickness or the moderate-severe sickness group. CONCLUSIONS: The linear combination of resting levels of salivary amylase, high-frequency R-R interval levels, and a sympathovagal index may be useful in predicting motion sickness severity.

  16. Predicting Motion Sickness During Parabolic Flight

    NASA Technical Reports Server (NTRS)

    Harm, Deborah L.; Schlegel, Todd T.

    2002-01-01

    Background: There are large individual differences in susceptibility to motion sickness. Attempts to predict who will become motion sick have had limited success. In the present study we examined gender differences in resting levels of salivary amylase and total protein, cardiac interbeat intervals (R-R intervals), and a sympathovagal index and evaluated their potential to correctly classify individuals into two motion sickness severity groups. Methods: Sixteen subjects (10 men and 6 women) flew 4 sets of 10 parabolas aboard NASA's KC-135 aircraft. Saliva samples for amylase and total protein were collected preflight on the day of the flight and motion sickness symptoms were recorded during each parabola. Cardiovascular parameters were collected in the supine position 1-5 days prior to the flight. Results: There were no significant gender differences in sickness severity or any of the other variables mentioned above. Discriminant analysis using salivary amylase, R-R intervals and the sympathovagal index produced a significant Wilks' lambda coefficient of 0.36, p= 0.006. The analysis correctly classified 87% of the subjects into the none-mild sickness or the moderate-severe sickness group. Conclusions: The linear combination of resting levels of salivary amylase, high frequency R-R interval levels, and a sympathovagal index may be useful in predicting motion sickness severity.

  17. Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

    PubMed Central

    Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

    2008-01-01

    Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748

  18. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .

  19. In silico characterization of a novel pathogenic deletion mutation identified in XPA gene in a Pakistani family with severe xeroderma pigmentosum

    PubMed Central

    2013-01-01

    Background Xeroderma Pigmentosum (XP) is a rare skin disorder characterized by skin hypersensitivity to sunlight and abnormal pigmentation. The aim of this study was to investigate the genetic cause of a severe XP phenotype in a consanguineous Pakistani family and in silico characterization of any identified disease-associated mutation. Results The XP complementation group was assigned by genotyping of family for known XP loci. Genotyping data mapped the family to complementation group A locus, involving XPA gene. Mutation analysis of the candidate XP gene by DNA sequencing revealed a novel deletion mutation (c.654del A) in exon 5 of XPA gene. The c.654del A, causes frameshift, which pre-maturely terminates protein and result into a truncated product of 222 amino acid (aa) residues instead of 273 (p.Lys218AsnfsX5). In silico tools were applied to study the likelihood of changes in structural motifs and thus interaction of mutated protein with binding partners. In silico analysis of mutant protein sequence, predicted to affect the aa residue which attains coiled coil structure. The coiled coil structure has an important role in key cellular interactions, especially with DNA damage-binding protein 2 (DDB2), which has important role in DDB-mediated nucleotide excision repair (NER) system. Conclusions Our findings support the fact of genetic and clinical heterogeneity in XP. The study also predicts the critical role of DDB2 binding region of XPA protein in NER pathway and opens an avenue for further research to study the functional role of the mutated protein domain. PMID:24063568

  20. Optimal contact definition for reconstruction of Contact Maps

    PubMed Central

    2010-01-01

    Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap. PMID:20507547

  1. DNA Sequence Analysis of Sry Alleles (Subgenus Mus) Implicates Misregulation as the Cause of C57bl/6j-Y(pos) Sex Reversal and Defines the Sry Functional Unit

    PubMed Central

    Albrecht, K. H.; Eicher, E. M.

    1997-01-01

    The Sry (sex determining region, Y chromosome) open reading frame from mice representing four species of the genus Mus was sequenced in an effort to understand the conditional dysfunction of some M. domesticus Sry alleles when present on the C57BL/6J inbred strain genetic background and to delimit the functionally important protein regions. Twenty-two Sry alleles were sequenced, most from wild-derived Y chromosomes, including 11 M. domesticus alleles, seven M. musculus alleles and two alleles each from the related species M. spicilegus and M. spretus. We found that the HMG domain (high mobility group DNA binding domain) and the unique regions are well conserved, while the glutamine repeat cluster (GRC) region is quite variable. No correlation was found between the predicted protein isoforms and the ability of a Sry allele to allow differentiation of ovarian tissue when on the C57BL/6J genetic background, strongly suggesting that the cause of this sex reversal is not the Sry protein itself, but rather the regulation of SRY expression. Furthermore, our interspecies sequence analysis provides compelling evidence that the M. musculus and M. domesticus SRY functional domain is contained in the first 143 amino acids, which includes the HMG domain and adjacent unique region (UR-2). PMID:9383069

  2. Predicting Droplet Formation on Centrifugal Microfluidic Platforms

    NASA Astrophysics Data System (ADS)

    Moebius, Jacob Alfred

    Centrifugal microfluidics is a widely known research tool for biological sample and water quality analysis. Currently, the standard equipment used for such diagnostic applications include slow, bulky machines controlled by multiple operators. These machines can be condensed into a smaller, faster benchtop sample-to-answer system. Sample processing is an important step taken to extract, isolate, and convert biological factors, such as nucleic acids or proteins, from a raw sample to an analyzable solution. Volume definition is one such step. The focus of this thesis is the development of a model predicting monodispersed droplet formation and the application of droplets as a technique for volume definition. First, a background of droplet microfluidic platforms is presented, along with current biological analysis technologies and the advantages of integrating such technologies onto microfluidic platforms. Second, background and theories of centrifugal microfluidics is given, followed by theories relevant to droplet emulsions. Third, fabrication techniques for centrifugal microfluidic designs are discussed. Finally, the development of a model for predicting droplet formation on the centrifugal microfluidic platform are presented for the rest of the thesis. Predicting droplet formation analytically based on the volumetric flow rates of the continuous and dispersed phases, the ratios of these two flow rates, and the interfacial tension between the continuous and dispersed phases presented many challenges, which will be discussed in this work. Experimental validation was completed using continuous phase solutions of different interfacial tensions. To conclude, prospective applications are discussed with expected challenges.

  3. Prediction of Carbohydrate Binding Sites on Protein Surfaces with 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Tsai, Keng-Chang; Jian, Jhih-Wei; Yang, Ei-Wen; Hsu, Po-Chiang; Peng, Hung-Pin; Chen, Ching-Tai; Chen, Jun-Bo; Chang, Jeng-Yih; Hsu, Wen-Lian; Yang, An-Suei

    2012-01-01

    Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date. PMID:22848404

  4. Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

    PubMed Central

    Niskanen, Einari A; Hytönen, Vesa P; Grapputo, Alessandro; Nordlund, Henri R; Kulomaa, Markku S; Laitinen, Olli H

    2005-01-01

    Background A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins. PMID:15777476

  5. Demonstration of protein-fragment complementation assay using purified firefly luciferase fragments

    PubMed Central

    2013-01-01

    Background Human interactome is predicted to contain 150,000 to 300,000 protein-protein interactions, (PPIs). Protein-fragment complementation assay (PCA) is one of the most widely used methods to detect PPI, as well as Förster resonance energy transfer (FRET). To date, successful applications of firefly luciferase (Fluc)-based PCA have been reported in vivo, in cultured cells and in cell-free lysate, owing to its high sensitivity, high signal-to-background (S/B) ratio, and reversible response. Here we show the assay also works with purified proteins with unexpectedly rapid kinetics. Results Split Fluc fragments both fused with a rapamycin-dependently interacting protein pair were made and expressed in E. coli system, and purified to homogeneity. When the proteins were used for PCA to detect rapamycin-dependent PPI, they enabled a rapid detection (~1 s) of PPI with high S/B ratio. When Fn7-8 domains (7 nm in length) that was shown to abrogate GFP mutant-based FRET was inserted between split Fluc and FKBP12 as a rigid linker, it still showed some response, suggesting less limitation in interacting partner’s size. Finally, the stability of the probe was investigated. Preincubation of the probes at 37 degreeC up to 1 h showed marked decrease of the luminescent signal to 1.5%, showing the limited stability of this system. Conclusion Fluc PCA using purified components will enable a rapid and handy detection of PPIs with high S/B ratio, avoiding the effects of concomitant components. Although the system might not be suitable for large-scale screening due to its limited stability, it can detect an interaction over larger distance than by FRET. This would be the first demonstration of Fluc PCA in vitro, which has a distinct advantage over other PPI assays. Our system enables detection of direct PPIs without risk of perturbation by PPI mediators in the complex cellular milieu. PMID:23536995

  6. The Puf family of RNA-binding proteins in plants: phylogeny, structural modeling, activity and subcellular localization

    PubMed Central

    2010-01-01

    Background Puf proteins have important roles in controlling gene expression at the post-transcriptional level by promoting RNA decay and repressing translation. The Pumilio homology domain (PUM-HD) is a conserved region within Puf proteins that binds to RNA with sequence specificity. Although Puf proteins have been well characterized in animal and fungal systems, little is known about the structural and functional characteristics of Puf-like proteins in plants. Results The Arabidopsis and rice genomes code for 26 and 19 Puf-like proteins, respectively, each possessing eight or fewer Puf repeats in their PUM-HD. Key amino acids in the PUM-HD of several of these proteins are conserved with those of animal and fungal homologs, whereas other plant Puf proteins demonstrate extensive variability in these amino acids. Three-dimensional modeling revealed that the predicted structure of this domain in plant Puf proteins provides a suitable surface for binding RNA. Electrophoretic gel mobility shift experiments showed that the Arabidopsis AtPum2 PUM-HD binds with high affinity to BoxB of the Drosophila Nanos Response Element I (NRE1) RNA, whereas a point mutation in the core of the NRE1 resulted in a significant reduction in binding affinity. Transient expression of several of the Arabidopsis Puf proteins as fluorescent protein fusions revealed a dynamic, punctate cytoplasmic pattern of localization for most of these proteins. The presence of predicted nuclear export signals and accumulation of AtPuf proteins in the nucleus after treatment of cells with leptomycin B demonstrated that shuttling of these proteins between the cytosol and nucleus is common among these proteins. In addition to the cytoplasmically enriched AtPum proteins, two AtPum proteins showed nuclear targeting with enrichment in the nucleolus. Conclusions The Puf family of RNA-binding proteins in plants consists of a greater number of members than any other model species studied to date. This, along with the amino acid variability observed within their PUM-HDs, suggests that these proteins may be involved in a wide range of post-transcriptional regulatory events that are important in providing plants with the ability to respond rapidly to changes in environmental conditions and throughout development. PMID:20214804

  7. Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets

    PubMed Central

    2011-01-01

    Background M. tuberculosis is a formidable bacterial pathogen. There is thus an increasing demand on understanding the function and relationship of proteins in various strains of M. tuberculosis. Protein-protein interactions (PPIs) data are crucial for this kind of knowledge. However, the quality of the main available M. tuberculosis PPI datasets is unclear. This hampers the effectiveness of research works that rely on these PPI datasets. Here, we analyze the two main available M. tuberculosis H37Rv PPI datasets. The first dataset is the high-throughput B2H PPI dataset from Wang et al’s recent paper in Journal of Proteome Research. The second dataset is from STRING database, version 8.3, comprising entirely of H37Rv PPIs predicted using various methods. We find that these two datasets have a surprisingly low level of agreement. We postulate the following causes for this low level of agreement: (i) the H37Rv B2H PPI dataset is of low quality; (ii) the H37Rv STRING PPI dataset is of low quality; and/or (iii) the H37Rv STRING PPIs are predictions of other forms of functional associations rather than direct physical interactions. Results To test the quality of these two datasets, we evaluate them based on correlated gene expression profiles, coherent informative GO term annotations, and conservation in other organisms. We observe a significantly greater portion of PPIs in the H37Rv STRING PPI dataset (with score ≥ 770) having correlated gene expression profiles and coherent informative GO term annotations in both interaction partners than that in the H37Rv B2H PPI dataset. Predicted H37Rv interologs derived from non-M. tuberculosis experimental PPIs are much more similar to the H37Rv STRING functional associations dataset (with score ≥ 770) than the H37Rv B2H PPI dataset. H37Rv predicted physical interologs from IntAct also show extremely low similarity with the H37Rv B2H PPI dataset; and this similarity level is much lower than that between the S. aureus MRSA252 predicted physical interologs from IntAct and S. aureus MRSA252 pull-down PPIs. Comparative analysis with several representative two-hybrid PPI datasets in other species further confirms that the H37Rv B2H PPI dataset is of low quality. Next, to test the possibility that the H37Rv STRING PPIs are not purely direct physical interactions, we compare M. tuberculosis H37Rv protein pairs that catalyze adjacent steps in enzymatic reactions to B2H PPIs and predicted PPIs in STRING, which shows it has much lower similarities with the B2H PPIs than with STRING PPIs. This result strongly suggests that the H37Rv STRING PPIs more likely correspond to indirect relationships between protein pairs than to B2H PPIs. For more precise support, we turn to S. cerevisiae for its comprehensively studied interactome. We compare S. cerevisiae predicted PPIs in STRING to three independent protein relationship datasets which respectively comprise PPIs reported in Y2H assays, protein pairs reported to be in the same protein complexes, and protein pairs that catalyze successive reaction steps in enzymatic reactions. Our analysis reveals that S. cerevisiae predicted STRING PPIs have much higher similarity to the latter two types of protein pairs than to two-hybrid PPIs. As H37Rv STRING PPIs are predicted using similar methods as S. cerevisiae predicted STRING PPIs, this suggests that these H37Rv STRING PPIs are more likely to correspond to the latter two types of protein pairs rather than to two-hybrid PPIs as well. Conclusions The H37Rv B2H PPI dataset has low quality. It should not be used as the gold standard to assess the quality of other (possibly predicted) H37Rv PPI datasets. The H37Rv STRING PPI dataset also has low quality; nevertheless, a subset consisting of STRING PPIs with score ≥770 has satisfactory quality. However, these STRING “PPIs” should be interpreted as functional associations, which include a substantial portion of indirect protein interactions, rather than direct physical interactions. These two factors cause the strikingly low similarity between these two main H37Rv PPI datasets. The results and conclusions from this comparative analysis provide valuable guidance in using these M. tuberculosis H37Rv PPI datasets in subsequent studies for a wide range of purposes. PMID:22369691

  8. Discovery: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria

    PubMed Central

    Joubert, Fourie; Harrison, Claudia M; Koegelenberg, Riaan J; Odendaal, Christiaan J; de Beer, Tjaart AP

    2009-01-01

    Background Up to half a billion human clinical cases of malaria are reported each year, resulting in about 2.7 million deaths, most of which occur in sub-Saharan Africa. Due to the over-and misuse of anti-malarials, widespread resistance to all the known drugs is increasing at an alarming rate. Rational methods to select new drug target proteins and lead compounds are urgently needed. The Discovery system provides data mining functionality on extensive annotations of five malaria species together with the human and mosquito hosts, enabling the selection of new targets based on multiple protein and ligand properties. Methods A web-based system was developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein features used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions and host-pathogen interactions among others. Searching by chemical structure is also available. Results An in silico system for the selection of putative drug targets and lead compounds is presented, together with an example study on the bifunctional DHFR-TS from Plasmodium falciparum. Conclusion The Discovery system allows for the identification of putative drug targets and lead compounds in Plasmodium species based on the filtering of protein and chemical properties. PMID:19642978

  9. Highly Resolved Sub-Terahertz Vibrational Spectroscopy of Biological Macromolecules and Bacteria Cells

    DTIC Science & Technology

    2016-07-01

    between average background spectrum and chicken egg - white lysozyme protein spectrum...spectroscopic signatures were conducted using human insulin protein and chicken egg -white lysozyme protein. Proteins with different structures...the comparison between the average background THz spectrum (black line in Figure 13) and the chicken egg -white lysozyme THz spectrum (blue line

  10. Inter-kingdom prediction certainty evaluation of protein subcellular localization tools: microbial pathogenesis approach for deciphering host microbe interaction.

    PubMed

    Khan, Abdul Arif; Khan, Zakir; Kalam, Mohd Abul; Khan, Azmat Ali

    2018-01-01

    Microbial pathogenesis involves several aspects of host-pathogen interactions, including microbial proteins targeting host subcellular compartments and subsequent effects on host physiology. Such studies are supported by experimental data, but recent detection of bacterial proteins localization through computational eukaryotic subcellular protein targeting prediction tools has also come into practice. We evaluated inter-kingdom prediction certainty of these tools. The bacterial proteins experimentally known to target host subcellular compartments were predicted with eukaryotic subcellular targeting prediction tools, and prediction certainty was assessed. The results indicate that these tools alone are not sufficient for inter-kingdom protein targeting prediction. The correct prediction of pathogen's protein subcellular targeting depends on several factors, including presence of localization signal, transmembrane domain and molecular weight, etc., in addition to approach for subcellular targeting prediction. The detection of protein targeting in endomembrane system is comparatively difficult, as the proteins in this location are channelized to different compartments. In addition, the high specificity of training data set also creates low inter-kingdom prediction accuracy. Current data can help to suggest strategy for correct prediction of bacterial protein's subcellular localization in host cell. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Incorporating information on predicted solvent accessibility to the co-evolution-based study of protein interactions.

    PubMed

    Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2013-01-27

    A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.

  12. DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

    PubMed Central

    2011-01-01

    Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot. PMID:21689480

  13. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  14. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  15. Measures of Urinary Protein and Albumin in the Prediction of Progression of IgA Nephropathy

    PubMed Central

    Zhao, Yan-feng; Liu, Li-jun; Shi, Su-fang; Lv, Ji-cheng; Zhang, Hong

    2016-01-01

    Background and objectives Proteinuria is an independent predictor for IgA nephropathy (IgAN) progression. Urine albumin-to-creatinine ratio (ACR), protein-to-creatinine ratio, and 24-hour urine protein excretion (UPE) are widely used for proteinuria evaluation in clinical practice. Here, we evaluated the association of these measurements with clinical and histologic findings of IgAN and explored which was the best predictor of IgAN prognosis. Design, setting, participants, & measurements Patients with IgAN were followed up for ≥12 months, were diagnosed between 2003 and 2012, and had urine samples available (438 patients). Spot urine ACR, protein-to-creatinine ratio, and 24-hour UPE at the time of renal biopsy were measured on a Hitachi Automatic Biochemical Analyzer 7180 (Hitachi, Yokohama, Japan). Results In our patients, ACR, protein-to-creatinine ratio, and 24-hour UPE were highly correlated (correlation coefficients: 0.71–0.87). They showed good relationships with acknowledged markers reflecting IgAN severity, including eGFR, hypertension, and the biopsy parameter (Oxford severity of tubular atrophy/interstitial fibrosis parameter). However, only ACR presented with positive association with the Oxford segmental glomerulosclerosis/adhesion parameter and extracapillary proliferation lesions. The follow-up time was 37.0 (22.0–58.0) months, with the last follow-up on April 18, 2014. In total, 124 patients reached the composite end point (30% eGFR decline, ESRD, or death). In univariate survival analysis, ACR consistently had better performance than protein-to-creatinine ratio and 24-hour UPE as represented by higher area under the curve using time–dependent survival analysis. When adjusted for well known risk factors for IgAN progression, ACR was most significantly associated with the composite end point (hazard ratio, 1.56 per 1-SD change of standard normalized square root–transformed ACR; 95% confidence interval, 1.29 to 1.89; P<0.001). Compared with protein-to-creatinine ratio and 24-hour UPE, addition of ACR to traditional risk factors resulted in more improvement in the predictive ability of IgAN progression (c statistic: ACR=0.70; protein-to-creatinine ratio =0.68; 24-hour UPE =0.69; Akaike information criterion: ACR=1217.85; protein-to-creatinine ratio =1229.28; 24-hour UPE =1234.96; P<0.001). Conclusions In IgAN, ACR, protein-to-creatinine ratio, and 24-hour UPE had comparable association with severe clinical and histologic findings. Compared with protein-to-creatinine ratio and 24-hour UPE, ACR showed slightly better performance in predicting IgAN progression. PMID:27026518

  16. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    PubMed Central

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-01-01

    Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method. PMID:19193216

  17. EnzML: multi-label prediction of enzyme classes using InterPro signatures

    PubMed Central

    2012-01-01

    Background Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function. Results We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters. Conclusions InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN). PMID:22533924

  18. An antiapoptotic Bcl-2 family protein index predicts the response of leukaemic cells to the pan-Bcl-2 inhibitor S1

    PubMed Central

    Zhang, Z; Liu, Y; Song, T; Xue, Z; Shen, X; Liang, F; Zhao, Y; Li, Z; Sheng, H

    2013-01-01

    Background: Bcl-2-like members have been found to be inherently overexpressed in many types of haematologic malignancies. The small-molecule S1 is a BH3 mimetic and a triple inhibitor of Bcl-2, Mcl-1 and Bcl-XL. Methods: The lethal dose 50 (LD50) values of S1 in five leukaemic cell lines and 41 newly diagnosed leukaemia samples were tested. The levels of Bcl-2 family members and phosphorylated Bcl-2 were semiquantitatively measured by western blotting. The interactions between Bcl-2 family members were tested by co-immunoprecipitation. The correlation between the LD50 and expression levels of Bcl-2 family members, alone or in combination, was analysed. Results: S1 exhibited variable sensitivity with LD50 values ranging >2 logs in both established and primary leukaemic cells. The ratio of pBcl-2/(Bcl-2+Mcl-1) could predict the S1 response. Furthermore, we demonstrated that pBcl-2 antagonised S1 by sequestering the Bak and Bim proteins that were released from Mcl-1, andpBcl-2/Bak, pBcl-2/Bax and pBcl-2/Bim complexes cannot be disrupted by S1. Conclusion: A predictive index was obtained for the novel BH3 mimetic S1. The shift of proapoptotic proteins from being complexed with Mcl-1 to being complexed with pBcl-2 was revealed for the first time, which is the mechanism underlying the index value described herein. PMID:23558901

  19. Modeling synthetic lethality

    PubMed Central

    Le Meur, Nolwenn; Gentleman, Robert

    2008-01-01

    Background Synthetic lethality defines a genetic interaction where the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. A challenge is to assess genome-wide experimental data and integrate the results to better understand the underlying biological processes. We propose statistical and computational tools that can be used to find relationships between synthetic lethality and cellular organizational units. Results In Saccharomyces cerevisiae, we identified multi-protein complexes and pairs of multi-protein complexes that share an unusually high number of synthetic genetic interactions. As previously predicted, we found that synthetic lethality can arise from subunits of an essential multi-protein complex or between pairs of multi-protein complexes. Finally, using multi-protein complexes allowed us to take into account the pleiotropic nature of the gene products. Conclusions Modeling synthetic lethality using current estimates of the yeast interactome is an efficient approach to disentangle some of the complex molecular interactions that drive a cell. Our model in conjunction with applied statistical methods and computational methods provides new tools to better characterize synthetic genetic interactions. PMID:18789146

  20. Repressing a Repressor

    PubMed Central

    Silverstone, Aron L.; Jung, Hou-Sung; Dill, Alyssa; Kawaide, Hiroshi; Kamiya, Yuji; Sun, Tai-ping

    2001-01-01

    RGA (for repressor of ga1-3) and SPINDLY (SPY) are likely repressors of gibberellin (GA) signaling in Arabidopsis because the recessive rga and spy mutations partially suppressed the phenotype of the GA-deficient mutant ga1-3. We found that neither rga nor spy altered the GA levels in the wild-type or the ga1-3 background. However, expression of the GA biosynthetic gene GA4 was reduced 26% by the rga mutation, suggesting that partial derepression of the GA response pathway by rga resulted in the feedback inhibition of GA4 expression. The green fluorescent protein (GFP)–RGA fusion protein was localized to nuclei in transgenic Arabidopsis. This result supports the predicted function of RGA as a transcriptional regulator based on sequence analysis. Confocal microscopy and immunoblot analyses demonstrated that the levels of both the GFP-RGA fusion protein and endogenous RGA were reduced rapidly by GA treatment. Therefore, the GA signal appears to derepress the GA signaling pathway by degrading the repressor protein RGA. The effect of rga on GA4 gene expression and the effect of GA on RGA protein level allow us to identify part of the mechanism by which GA homeostasis is achieved. PMID:11449051

  1. Identify High-Quality Protein Structural Models by Enhanced K-Means.

    PubMed

    Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.

  2. Identify High-Quality Protein Structural Models by Enhanced K-Means

    PubMed Central

    Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198

  3. Long-term prediction of fish growth under varying ambient temperature using a multiscale dynamic model

    PubMed Central

    2009-01-01

    Background Feed composition has a large impact on the growth of animals, particularly marine fish. We have developed a quantitative dynamic model that can predict the growth and body composition of marine fish for a given feed composition over a timespan of several months. The model takes into consideration the effects of environmental factors, particularly temperature, on growth, and it incorporates detailed kinetics describing the main metabolic processes (protein, lipid, and central metabolism) known to play major roles in growth and body composition. Results For validation, we compared our model's predictions with the results of several experimental studies. We showed that the model gives reliable predictions of growth, nutrient utilization (including amino acid retention), and body composition over a timespan of several months, longer than most of the previously developed predictive models. Conclusion We demonstrate that, despite the difficulties involved, multiscale models in biology can yield reasonable and useful results. The model predictions are reliable over several timescales and in the presence of strong temperature fluctuations, which are crucial factors for modeling marine organism growth. The model provides important improvements over existing models. PMID:19903354

  4. Assessment of Protein Side-Chain Conformation Prediction Methods in Different Residue Environments

    PubMed Central

    Peterson, Lenna X.; Kang, Xuejiao; Kihara, Daisuke

    2016-01-01

    Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. PMID:24619909

  5. Calculation of the relative metastabilities of proteins using the CHNOSZ software package

    PubMed Central

    Dick, Jeffrey M

    2008-01-01

    Background Proteins of various compositions are required by organisms inhabiting different environments. The energetic demands for protein formation are a function of the compositions of proteins as well as geochemical variables including temperature, pressure, oxygen fugacity and pH. The purpose of this study was to explore the dependence of metastable equilibrium states of protein systems on changes in the geochemical variables. Results A software package called CHNOSZ implementing the revised Helgeson-Kirkham-Flowers (HKF) equations of state and group additivity for ionized unfolded aqueous proteins was developed. The program can be used to calculate standard molal Gibbs energies and other thermodynamic properties of reactions and to make chemical speciation and predominance diagrams that represent the metastable equilibrium distributions of proteins. The approach takes account of the chemical affinities of reactions in open systems characterized by the chemical potentials of basis species. The thermodynamic database included with the package permits application of the software to mineral and other inorganic systems as well as systems of proteins or other biomolecules. Conclusion Metastable equilibrium activity diagrams were generated for model cell-surface proteins from archaea and bacteria adapted to growth in environments that differ in temperature and chemical conditions. The predicted metastable equilibrium distributions of the proteins can be compared with the optimal growth temperatures of the organisms and with geochemical variables. The results suggest that a thermodynamic assessment of protein metastability may be useful for integrating bio- and geochemical observations. PMID:18834534

  6. Discrete structural features among interface residue-level classes

    PubMed Central

    2015-01-01

    Background Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding. Results Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes. Conclusions Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs. PMID:26679043

  7. Higher Leptin and Adiponectin Concentrations Predict Poorer Performance-based Physical Functioning in Midlife Women: the Michigan Study of Women’s Health Across the Nation

    PubMed Central

    Zheng, Huiyong; Mancuso, Peter; Harlow, Siobán D.

    2016-01-01

    Background. Excess fat mass is a greater contributor to functional limitations than is reduced lean mass or the presence of obesity-related conditions. The impact of fat mass on physical functioning may be due to adipokines, adipose-derived proteins that have pro- or anti-inflammatory properties. Methods. Serum samples from 1996 to 2003 that were assayed for leptin, adiponectin, and resistin were provided by 511 participants from the Michigan site of the Study of Women’s Health Across the Nation. Physical functioning performance was assessed annually during study visits from 1996 to 2003. Results. Among this population of Black and White women (mean baseline age = 45.6 years, SD = 2.7 years), all of whom were premenopausal at baseline, higher baseline leptin concentrations predicted longer stair climb, sit-to-rise, and 2-pound lift times and shorter forward reach distance (all p < .01). This relationship persisted after adjustment for age, BMI, percent skeletal muscle mass, race/ethnicity, economic strain, bodily pain, diabetes, knee osteoarthritis, and C-reactive protein. Baseline total adiponectin concentrations did not predict any mobility measures but did predict quadriceps strength; a 1 µg/mL higher adiponectin concentration was associated with 0.64 Nm lower quadriceps strength (p = .02). Resistin was not associated with any of the physical functioning performance measures. Change in the adipokines was not associated with physical functioning. Conclusion. In this population of middle-aged women, higher baseline leptin concentrations predicted poorer mobility-based functioning, whereas higher adiponectin concentrations predicted reduced quadriceps strength. These findings suggest that the relationship between the adipokines and physical functioning performance is independent of other known correlates of poor functioning. PMID:26302979

  8. Evaluation of 3D-Jury on CASP7 models

    PubMed Central

    Kaján, László; Rychlewski, Leszek

    2007-01-01

    Background 3D-Jury, the structure prediction consensus method publicly available in the Meta Server , was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. Results The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. Conclusion The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature available in the Meta Server. PMID:17711571

  9. C-Reactive Protein and Prediction of 1-Year Mortality in Prevalent Hemodialysis Patients

    PubMed Central

    Bazeley, Jonathan; Bieber, Brian; Li, Yun; Morgenstern, Hal; de Sequera, Patricia; Combe, Christian; Yamamoto, Hiroyasu; Gallagher, Martin; Port, Friedrich K.

    2011-01-01

    Summary Background and objectives Measurement of C-reactive protein (CRP) levels remains uncommon in North America, although it is now routine in many countries. Using Dialysis Outcomes and Practice Patterns Study data, our primary aim was to evaluate the value of CRP for predicting mortality when measured along with other common inflammatory biomarkers. Design, setting, participants, & measurements We studied 5061 prevalent hemodialysis patients from 2005 to 2008 in 140 facilities routinely measuring CRP in 10 countries. The association of CRP with mortality was evaluated using Cox regression. Prediction of 1-year mortality was assessed in logistic regression models with differing adjustment variables. Results Median baseline CRP was lower in Japan (1.0 mg/L) than other countries (6.0 mg/L). CRP was positively, monotonically associated with mortality. No threshold below which mortality rate leveled off was identified. In prediction models, CRP performance was comparable with albumin and exceeded ferritin and white blood cell (WBC) count based on measures of model discrimination (c-statistics, net reclassification improvement [NRI]) and global model fit (generalized R2). The primary analysis included age, gender, diabetes, catheter use, and the four inflammatory markers (omitting one at a time). Specifying NRI ≥5% as appropriate reclassification of predicted mortality risk, NRI for CRP was 12.8% compared with 10.3% for albumin, 0.8% for ferritin, and <0.1% for WBC. Conclusions These findings demonstrate the value of measuring CRP in addition to standard inflammatory biomarkers to improve mortality prediction in hemodialysis patients. Future studies are indicated to identify interventions that lower CRP and to identify whether they improve clinical outcomes. PMID:21868617

  10. Knowledge Discovery in Variant Databases Using Inductive Logic Programming

    PubMed Central

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D.

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/. PMID:23589683

  11. Knowledge discovery in variant databases using inductive logic programming.

    PubMed

    Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D

    2013-01-01

    Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.

  12. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: new insights from bioinformatics analysis

    PubMed Central

    2012-01-01

    Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the possible biological functions of the rice OsGELP genes. Conclusions Our current genomic analysis, for the first time, presents fundamental information on the organization of the rice OsGELP gene family. With combination of the genomic, phylogenetic, microarray expression, protein motif distribution, and protein structure analyses, we were able to create supported basis for the functional prediction of many members in the rice GDSL esterase/lipase family. The present study provides a platform for the selection of candidate genes for further detailed functional study. PMID:22793791

  13. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    PubMed

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  14. Discriminability measures for predicting readability of text on textured backgrounds

    NASA Technical Reports Server (NTRS)

    Scharff, L. F.; Hill, A. L.; Ahumada, A. J. Jr; Watson, A. B. (Principal Investigator)

    2000-01-01

    Several discriminability measures were examined for their ability to predict reading search times for three levels of text contrast and a range of backgrounds (plain, a periodic texture, and four spatial-frequency-filtered textures created from the periodic texture). Search times indicate that these background variations only affect readability when the text contrast is low, and that spatial frequency content of the background affects readability. These results were not well predicted by the single variables of text contrast (Spearman rank correlation = -0.64) and background RMS contrast (0.08), but a global masking index and a spatial-frequency-selective masking index led to better predictions (-0.84 and -0.81, respectively). c2000 Optical Society of America.

  15. BindML/BindML+: Detecting Protein-Protein Interaction Interface Propensity from Amino Acid Substitution Patterns.

    PubMed

    Wei, Qing; La, David; Kihara, Daisuke

    2017-01-01

    Prediction of protein-protein interaction sites in a protein structure provides important information for elucidating the mechanism of protein function and can also be useful in guiding a modeling or design procedures of protein complex structures. Since prediction methods essentially assess the propensity of amino acids that are likely to be part of a protein docking interface, they can help in designing protein-protein interactions. Here, we introduce BindML and BindML+ protein-protein interaction sites prediction methods. BindML predicts protein-protein interaction sites by identifying mutation patterns found in known protein-protein complexes using phylogenetic substitution models. BindML+ is an extension of BindML for distinguishing permanent and transient types of protein-protein interaction sites. We developed an interactive web-server that provides a convenient interface to assist in structural visualization of protein-protein interactions site predictions. The input data for the web-server are a tertiary structure of interest. BindML and BindML+ are available at http://kiharalab.org/bindml/ and http://kiharalab.org/bindml/plus/ .

  16. Enhancing the Predictive Power of Mutations in the C-Terminus of the KCNQ1-Encoded Kv7.1 Voltage-Gated Potassium Channel.

    PubMed

    Kapplinger, Jamie D; Tseng, Andrew S; Salisbury, Benjamin A; Tester, David J; Callis, Thomas E; Alders, Marielle; Wilde, Arthur A M; Ackerman, Michael J

    2015-04-01

    Despite the overrepresentation of Kv7.1 mutations among patients with a robust diagnosis of long QT syndrome (LQTS), a background rate of innocuous Kv7.1 missense variants observed in healthy controls creates ambiguity in the interpretation of LQTS genetic test results. A recent study showed that the probability of pathogenicity for rare missense mutations depends in part on the topological location of the variant in Kv7.1's various structure-function domains. Since the Kv7.1's C-terminus accounts for nearly 50 % of the overall protein and nearly 50 % of the overall background rate of rare variants falls within the C-terminus, further enhancement in mutation calling may provide guidance in distinguishing pathogenic long QT syndrome type 1 (LQT1)-causing mutations from rare non-disease-causing variants in the Kv7.1's C-terminus. Therefore, we have used conservation analysis and a large case-control study to generate topology-based estimative predictive values to aid in interpretation, identifying three regions of high conservation within the Kv7.1's C-terminus which have a high probability of LQT1 pathogenicity.

  17. Molecular docking.

    PubMed

    Morris, Garrett M; Lim-Wilby, Marguerita

    2008-01-01

    Molecular docking is a key tool in structural molecular biology and computer-assisted drug design. The goal of ligand-protein docking is to predict the predominant binding mode(s) of a ligand with a protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that correctly ranks candidate dockings. Docking can be used to perform virtual screening on large libraries of compounds, rank the results, and propose structural hypotheses of how the ligands inhibit the target, which is invaluable in lead optimization. The setting up of the input structures for the docking is just as important as the docking itself, and analyzing the results of stochastic search methods can sometimes be unclear. This chapter discusses the background and theory of molecular docking software, and covers the usage of some of the most-cited docking software.

  18. Protein cleavage strategies for an improved analysis of the membrane proteome

    PubMed Central

    Fischer, Frank; Poetsch, Ansgar

    2006-01-01

    Background Membrane proteins still remain elusive in proteomic studies. This is in part due to the distribution of the amino acids lysine and arginine, which are less frequent in integral membrane proteins and almost absent in transmembrane helices. As these amino acids are cleavage targets for the commonly used protease trypsin, alternative cleavage conditions, which should improve membrane protein analysis, were tested by in silico digestion for the three organisms Saccharomyces cerevisiae, Halobacterium sp. NRC-1, and Corynebacterium glutamicum as hallmarks for eukaryotes, archea and eubacteria. Results For the membrane proteomes from all three analyzed organisms, we identified cleavage conditions that achieve better sequence and proteome coverage than trypsin. Greater improvement was obtained for bacteria than for yeast, which was attributed to differences in protein size and GRAVY. It was demonstrated for bacteriorhodopsin that the in silico predictions agree well with the experimental observations. Conclusion For all three examined organisms, it was found that a combination of chymotrypsin and staphylococcal peptidase I gave significantly better results than trypsin. As some of the improved cleavage conditions are not more elaborate than trypsin digestion and have been proven useful in practice, we suppose that the cleavage at both hydrophilic and hydrophobic amino acids should facilitate in general the analysis of membrane proteins for all organisms. PMID:16512920

  19. Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM.

    PubMed

    Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem

    2011-08-11

    Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.

  20. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  1. Murine colon proteome and characterization of the protein pathways

    PubMed Central

    2012-01-01

    Background Most of the current proteomic researches focus on proteome alteration due to pathological disorders (i.e.: colorectal cancer) rather than normal healthy state when mentioning colon. As a result, there are lacks of information regarding normal whole tissue- colon proteome. Results We report here a detailed murine (mouse) whole tissue- colon protein reference dataset composed of 1237 confident protein (FDR < 2) with comprehensive insight on its peptide properties, cellular and subcellular localization, functional network GO annotation analysis, and its relative abundances. The presented dataset includes wide spectra of pI and Mw ranged from 3–12 and 4–600 KDa, respectively. Gravy index scoring predicted 19.5% membranous and 80.5% globularly located proteins. GO hierarchies and functional network analysis illustrated proteins function together with their relevance and implication of several candidates in malignancy such as Mitogen- activated protein kinase (Mapk8, 9) in colorectal cancer, Fibroblast growth factor receptor (Fgfr 2), Glutathione S-transferase (Gstp1) in prostate cancer, and Cell division control protein (Cdc42), Ras-related protein (Rac1,2) in pancreatic cancer. Protein abundances calculated with 3 different algorithms (NSAF, PAF and emPAI) provide a relative quantification under normal condition as guidance. Conclusions This highly confidence colon proteome catalogue will not only serve as a useful reference for further experiments characterizing differentially expressed proteins induced from diseased conditions, but also will aid in better understanding the ontology and functional absorptive mechanism of the colon as well. PMID:22929016

  2. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.

    PubMed

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.

  3. Expression Analysis of the Theileria parva Subtelomere-Encoded Variable Secreted Protein Gene Family

    PubMed Central

    Schmied, Stéfanie; Affentranger, Sarah; Parvanova, Iana; Kang'a, Simon; Nene, Vishvanath; Katzer, Frank; McKeever, Declan; Müller, Joachim; Bishop, Richard; Pain, Arnab; Dobbelaere, Dirk A. E.

    2009-01-01

    Background The intracellular protozoan parasite Theileria parva transforms bovine lymphocytes inducing uncontrolled proliferation. Proteins released from the parasite are assumed to contribute to phenotypic changes of the host cell and parasite persistence. With 85 members, genes encoding subtelomeric variable secreted proteins (SVSPs) form the largest gene family in T. parva. The majority of SVSPs contain predicted signal peptides, suggesting secretion into the host cell cytoplasm. Methodology/Principal Findings We analysed SVSP expression in T. parva-transformed cell lines established in vitro by infection of T or B lymphocytes with cloned T. parva parasites. Microarray and quantitative real-time PCR analysis revealed mRNA expression for a wide range of SVSP genes. The pattern of mRNA expression was largely defined by the parasite genotype and not by host background or cell type, and found to be relatively stable in vitro over a period of two months. Interestingly, immunofluorescence analysis carried out on cell lines established from a cloned parasite showed that expression of a single SVSP encoded by TP03_0882 is limited to only a small percentage of parasites. Epitope-tagged TP03_0882 expressed in mammalian cells was found to translocate into the nucleus, a process that could be attributed to two different nuclear localisation signals. Conclusions Our analysis reveals a complex pattern of Theileria SVSP mRNA expression, which depends on the parasite genotype. Whereas in cell lines established from a cloned parasite transcripts can be found corresponding to a wide range of SVSP genes, only a minority of parasites appear to express a particular SVSP protein. The fact that a number of SVSPs contain functional nuclear localisation signals suggests that proteins released from the parasite could contribute to phenotypic changes of the host cell. This initial characterisation will facilitate future studies on the regulation of SVSP gene expression and the potential biological role of these enigmatic proteins. PMID:19325907

  4. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  5. Quantitative proteomic analyses of the microbial degradation of estrone under various background nitrogen and carbon conditions.

    PubMed

    Du, Zhe; Chen, Yinguang; Li, Xu

    2017-10-15

    Microbial degradation of estrogenic compounds can be affected by the nitrogen source and background carbon in the environment. However, the underlying mechanisms are not well understood. The objective of this study was to elucidate the molecular mechanisms of estrone (E1) biodegradation at the protein level under various background nitrogen (nitrate or ammonium) and carbon conditions (no background carbon, acetic acid, or humic acid as background carbon) by a newly isolated bacterial strain. The E1 degrading bacterial strain, Hydrogenophaga atypica ZD1, was isolated from river sediments and its proteome was characterized under various experimental conditions using quantitative proteomics. Results show that the E1 degradation rate was faster when ammonium was used as the nitrogen source than with nitrate. The degradation rate was also faster when either acetic acid or humic acid was present in the background. Proteomics analyses suggested that the E1 biodegradation products enter the tyrosine metabolism pathway. Compared to nitrate, ammonium likely promoted E1 degradation by increasing the activities of the branched-chain-amino-acid aminotransferase (IlvE) and enzymes involved in the glutamine synthetase-glutamine oxoglutarate aminotransferase (GS-GOGAT) pathway. The increased E1 degradation rate with acetic acid or humic acid in the background can also be attributed to the up-regulation of IlvE. Results from this study can help predict and explain E1 biodegradation kinetics under various environmental conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Connective tissue-activating peptide III: a novel blood biomarker for early lung cancer detection.

    PubMed

    Yee, John; Sadar, Marianne D; Sin, Don D; Kuzyk, Michael; Xing, Li; Kondra, Jennifer; McWilliams, Annette; Man, S F Paul; Lam, Stephen

    2009-06-10

    There are no reliable blood biomarkers to detect early lung cancer. We used a novel strategy that allows discovery of differentially present proteins against a complex and variable background. Mass spectrometry analyses of paired pulmonary venous-radial arterial blood from 16 lung cancer patients were applied to identify plasma proteins potentially derived from the tumor microenvironment. Two differentially expressed proteins were confirmed in 64 paired venous-arterial blood samples using an immunoassay. Twenty-eight pre- and postsurgical resection peripheral blood samples and two independent, blinded sets of plasma from 149 participants in a lung cancer screening study (49 lung cancers and 100 controls) and 266 participants from the National Heart Lung and Blood Institute Lung Health Study (45 lung cancer and 221 matched controls) determined the accuracy of the two protein markers to detect subclinical lung cancer. Connective tissue-activating peptide III (CTAP III)/ neutrophil activating protein-2 (NAP-2) and haptoglobin were identified to be significantly higher in venous than in arterial blood. CTAP III/NAP-2 levels decreased after tumor resection (P = .01). In two independent population cohorts, CTAP III/NAP-2 was significantly associated with lung cancer and improved the accuracy of a lung cancer risk prediction model that included age, smoking, lung function (FEV(1)), and an interaction term between FEV(1) and CTAP III/NAP-2 (area under the curve, 0.84; 95% CI, 0.77 to 0.91) compared to CAPIII/NAP-2 alone. We identified CTAP III/NAP-2 as a novel biomarker to detect preclinical lung cancer. The study underscores the importance of applying blood biomarkers as part of a multimodal lung cancer risk prediction model instead of as stand-alone tests.

  7. Brain amyloidosis ascertainment from cognitive, imaging, and peripheral blood protein measures

    PubMed Central

    Hwang, Kristy S.; Avila, David; Elashoff, David; Kohannim, Omid; Teng, Edmond; Sokolow, Sophie; Jack, Clifford R.; Jagust, William J.; Shaw, Leslie; Trojanowski, John Q.; Weiner, Michael W.; Thompson, Paul M.

    2015-01-01

    Background: The goal of this study was to identify a clinical biomarker signature of brain amyloidosis in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI1) mild cognitive impairment (MCI) cohort. Methods: We developed a multimodal biomarker classifier for predicting brain amyloidosis using cognitive, imaging, and peripheral blood protein ADNI1 MCI data. We used CSF β-amyloid 1–42 (Aβ42) ≤192 pg/mL as proxy measure for Pittsburgh compound B (PiB)-PET standard uptake value ratio ≥1.5. We trained our classifier in the subcohort with CSF Aβ42 but no PiB-PET data and tested its performance in the subcohort with PiB-PET but no CSF Aβ42 data. We also examined the utility of our biomarker signature for predicting disease progression from MCI to Alzheimer dementia. Results: The CSF training classifier selected Mini-Mental State Examination, Trails B, Auditory Verbal Learning Test delayed recall, education, APOE genotype, interleukin 6 receptor, clusterin, and ApoE protein, and achieved leave-one-out accuracy of 85% (area under the curve [AUC] = 0.8). The PiB testing classifier achieved an AUC of 0.72, and when classifier self-tuning was allowed, AUC = 0.74. The 36-month disease-progression classifier achieved AUC = 0.75 and accuracy = 71%. Conclusions: Automated classifiers based on cognitive and peripheral blood protein variables can identify the presence of brain amyloidosis with a modest level of accuracy. Such methods could have implications for clinical trial design and enrollment in the near future. Classification of evidence: This study provides Class II evidence that a classification algorithm based on cognitive, imaging, and peripheral blood protein measures identifies patients with brain amyloid on PiB-PET with moderate accuracy (sensitivity 68%, specificity 78%). PMID:25609767

  8. High performance transcription factor-DNA docking with GPU computing

    PubMed Central

    2012-01-01

    Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575

  9. Tertiary structure prediction and identification of druggable pocket in the cancer biomarker – Osteopontin-c

    PubMed Central

    2014-01-01

    Background Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research. Methods Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing. Results Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. Conclusions The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket. PMID:24401206

  10. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  11. Prediction of Body Fluids where Proteins are Secreted into Based on Protein Interaction Network

    PubMed Central

    Hu, Le-Le; Huang, Tao; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development. PMID:21829572

  12. RNase-assisted RNA chromatography

    PubMed Central

    Michlewski, Gracjan; Cáceres, Javier F.

    2010-01-01

    RNA chromatography combined with mass spectrometry represents a widely used experimental approach to identify RNA-binding proteins that recognize specific RNA targets. An important drawback of most of these protocols is the high background due to direct or indirect nonspecific binding of cellular proteins to the beads. In many cases this can hamper the detection of individual proteins due to their low levels and/or comigration with contaminating proteins. Increasing the salt concentration during washing steps can reduce background, but at the cost of using less physiological salt concentrations and the likely loss of important RNA-binding proteins that are less stringently bound to a given RNA, as well as the disassembly of protein or ribonucleoprotein complexes. Here, we describe an improved RNA chromatography method that relies on the use of a cocktail of RNases in the elution step. This results in the release of proteins specifically associated with the RNA ligand and almost complete elimination of background noise, allowing a more sensitive and thorough detection of RNA-binding proteins recognizing a specific RNA transcript. PMID:20571124

  13. Prediction of Protein Configurational Entropy (Popcoen).

    PubMed

    Goethe, Martin; Gleixner, Jan; Fita, Ignacio; Rubi, J Miguel

    2018-03-13

    A knowledge-based method for configurational entropy prediction of proteins is presented; this methodology is extremely fast, compared to previous approaches, because it does not involve any type of configurational sampling. Instead, the configurational entropy of a query fold is estimated by evaluating an artificial neural network, which was trained on molecular-dynamics simulations of ∼1000 proteins. The predicted entropy can be incorporated into a large class of protein software based on cost-function minimization/evaluation, in which configurational entropy is currently neglected for performance reasons. Software of this type is used for all major protein tasks such as structure predictions, proteins design, NMR and X-ray refinement, docking, and mutation effect predictions. Integrating the predicted entropy can yield a significant accuracy increase as we show exemplarily for native-state identification with the prominent protein software FoldX. The method has been termed Popcoen for Prediction of Protein Configurational Entropy. An implementation is freely available at http://fmc.ub.edu/popcoen/ .

  14. RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning.

    PubMed

    Gao, Yujuan; Wang, Sheng; Deng, Minghua; Xu, Jinbo

    2018-05-08

    Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study.

  15. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  16. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  17. The Interaction Properties of the Human Rab GTPase Family – A Comparative Analysis Reveals Determinants of Molecular Binding Selectivity

    PubMed Central

    Stein, Matthias; Pilli, Manohar; Bernauer, Sabine; Habermann, Bianca H.; Zerial, Marino; Wade, Rebecca C.

    2012-01-01

    Background Rab GTPases constitute the largest subfamily of the Ras protein superfamily. Rab proteins regulate organelle biogenesis and transport, and display distinct binding preferences for effector and activator proteins, many of which have not been elucidated yet. The underlying molecular recognition motifs, binding partner preferences and selectivities are not well understood. Methodology/Principal Findings Comparative analysis of the amino acid sequences and the three-dimensional electrostatic and hydrophobic molecular interaction fields of 62 human Rab proteins revealed a wide range of binding properties with large differences between some Rab proteins. This analysis assists the functional annotation of Rab proteins 12, 14, 26, 37 and 41 and provided an explanation for the shared function of Rab3 and 27. Rab7a and 7b have very different electrostatic potentials, indicating that they may bind to different effector proteins and thus, exert different functions. The subfamily V Rab GTPases which are associated with endosome differ subtly in the interaction properties of their switch regions, and this may explain exchange factor specificity and exchange kinetics. Conclusions/Significance We have analysed conservation of sequence and of molecular interaction fields to cluster and annotate the human Rab proteins. The analysis of three dimensional molecular interaction fields provides detailed insight that is not available from a sequence-based approach alone. Based on our results, we predict novel functions for some Rab proteins and provide insights into their divergent functions and the determinants of their binding partner selectivity. PMID:22523562

  18. Coronin-1C is a novel biomarker for hepatocellular carcinoma invasive progression identified by proteomics analysis and clinical validation

    PubMed Central

    2010-01-01

    Background To better search for potential markers for hepatocellular carcinoma (HCC) invasion and metastasis, proteomic approach was applied to identify potential metastasis biomarkers associated with HCC. Methods Membrane proteins were extracted from MHCC97L and HCCLM9 cells, with a similar genetic background and remarkably different metastasis potential, and compared by SDS-PAGE and identified by ESI-MS/MS. The results were further validated by western blot analysis, immunohistochemistry (IHC) of tumor tissues from HCCLM9- and MHCC97L-nude mice, and clinical specimens. Results Membrane proteins were extracted from MHCC97L and HCCLM9 cell and compared by SDS-PAGE analyses. A total of 14 differentially expressed proteins were identified by ESI-MS/MS. Coronin-1C, a promising candidate, was found to be overexpressed in HCCLM9 cells as compared with MHCC97L cells, and validated by western blot and IHC from both nude mice tumor tissues and clinical specimens. Coronin-1C level showed an abrupt upsurge when pulmonary metastasis occurred. Increasing coronin-1C expression was found in liver cancer tissues of HCCLM9-nude mice with spontaneous pulmonary metastasis. IHC study on human HCC specimens revealed that more patients in the higher coronin-1C group had overt larger tumor and more advanced stage. Conclusions Coronin-1C could be a candidate biomarker to predict HCC invasive behavior. PMID:20181269

  19. Contact Prediction for Beta and Alpha-Beta Proteins Using Integer Linear Optimization and its Impact on the First Principles 3D Structure Prediction Method ASTRO-FOLD

    PubMed Central

    Rajgaria, R.; Wei, Y.; Floudas, C. A.

    2010-01-01

    An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα – Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contacts that assign lowest energy to the protein structure while satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β-sheet alignments. These β-sheet alignments are used as constraints for contacts between residues of β-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. PMID:20225257

  20. Comprehensive predictions of target proteins based on protein-chemical interaction using virtual screening and experimental verifications.

    PubMed

    Kobayashi, Hiroki; Harada, Hiroko; Nakamura, Masaomi; Futamura, Yushi; Ito, Akihiro; Yoshida, Minoru; Iemura, Shun-Ichiro; Shin-Ya, Kazuo; Doi, Takayuki; Takahashi, Takashi; Natsume, Tohru; Imoto, Masaya; Sakakibara, Yasubumi

    2012-04-05

    Identification of the target proteins of bioactive compounds is critical for elucidating the mode of action; however, target identification has been difficult in general, mostly due to the low sensitivity of detection using affinity chromatography followed by CBB staining and MS/MS analysis. We applied our protocol of predicting target proteins combining in silico screening and experimental verification for incednine, which inhibits the anti-apoptotic function of Bcl-xL by an unknown mechanism. One hundred eighty-two target protein candidates were computationally predicted to bind to incednine by the statistical prediction method, and the predictions were verified by in vitro binding of incednine to seven proteins, whose expression can be confirmed in our cell system.As a result, 40% accuracy of the computational predictions was achieved successfully, and we newly found 3 incednine-binding proteins. This study revealed that our proposed protocol of predicting target protein combining in silico screening and experimental verification is useful, and provides new insight into a strategy for identifying target proteins of small molecules.

  1. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    PubMed

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  2. The abundant extrachromosomal DNA content of the Spiroplasma citri GII3-3X genome

    PubMed Central

    Saillard, Colette; Carle, Patricia; Duret-Nurbel, Sybille; Henri, Raphaël; Killiny, Nabil; Carrère, Sébastien; Gouzy, Jérome; Bové, Joseph-Marie; Renaudin, Joël; Foissac, Xavier

    2008-01-01

    Background Spiroplama citri, the causal agent of citrus stubborn disease, is a bacterium of the class Mollicutes and is transmitted by phloem-feeding leafhopper vectors. In order to characterize candidate genes potentially involved in spiroplasma transmission and pathogenicity, the genome of S. citri strain GII3-3X is currently being deciphered. Results Assembling 20,000 sequencing reads generated seven circular contigs, none of which fit the 1.8 Mb chromosome map or carried chromosomal markers. These contigs correspond to seven plasmids: pSci1 to pSci6, with sizes ranging from 12.9 to 35.3 kbp and pSciA of 7.8 kbp. Plasmids pSci were detected as multiple copies in strain GII3-3X. Plasmid copy numbers of pSci1-6, as deduced from sequencing coverage, were estimated at 10 to 14 copies per spiroplasma cell, representing 1.6 Mb of extrachromosomal DNA. Genes encoding proteins of the TrsE-TraE, Mob, TraD-TraG, and Soj-ParA protein families were predicted in most of the pSci sequences, in addition to members of 14 protein families of unknown function. Plasmid pSci6 encodes protein P32, a marker of insect transmissibility. Plasmids pSci1-5 code for eight different S. citri adhesion-related proteins (ScARPs) that are homologous to the previously described protein P89 and the S. kunkelii SkARP1. Conserved signal peptides and C-terminal transmembrane alpha helices were predicted in all ScARPs. The predicted surface-exposed N-terminal region possesses the following elements: (i) 6 to 8 repeats of 39 to 42 amino acids each (sarpin repeats), (ii) a central conserved region of 330 amino acids followed by (iii) a more variable domain of about 110 amino acids. The C-terminus, predicted to be cytoplasmic, consists of a 27 amino acid stretch enriched in arginine and lysine (KR) and an optional 23 amino acid stretch enriched in lysine, aspartate and glutamate (KDE). Plasmids pSci mainly present a linear increase of cumulative GC skew except in regions presenting conserved hairpin structures. Conclusion The genome of S. citri GII3-3X is characterized by abundant extrachromosomal elements. The pSci plasmids could not only be vertically inherited but also horizontally transmitted, as they encode proteins usually involved in DNA element partitioning and cell to cell DNA transfer. Because plasmids pSci1-5 encode surface proteins of the ScARP family and pSci6 was recently shown to confer insect transmissibility, diversity and abundance of S. citri plasmids may essentially aid the rapid adaptation of S. citri to more efficient transmission by different insect vectors and to various plant hosts. PMID:18442384

  3. Building protein-protein interaction networks for Leishmania species through protein structural information.

    PubMed

    Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro

    2018-03-06

    Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.

  4. DrugECs: An Ensemble System with Feature Subspaces for Accurate Drug-Target Interaction Prediction

    PubMed Central

    Jiang, Jinjian; Wang, Nian; Zhang, Jun

    2017-01-01

    Background Drug-target interaction is key in drug discovery, especially in the design of new lead compound. However, the work to find a new lead compound for a specific target is complicated and hard, and it always leads to many mistakes. Therefore computational techniques are commonly adopted in drug design, which can save time and costs to a significant extent. Results To address the issue, a new prediction system is proposed in this work to identify drug-target interaction. First, drug-target pairs are encoded with a fragment technique and the software “PaDEL-Descriptor.” The fragment technique is for encoding target proteins, which divides each protein sequence into several fragments in order and encodes each fragment with several physiochemical properties of amino acids. The software “PaDEL-Descriptor” creates encoding vectors for drug molecules. Second, the dataset of drug-target pairs is resampled and several overlapped subsets are obtained, which are then input into kNN (k-Nearest Neighbor) classifier to build an ensemble system. Conclusion Experimental results on the drug-target dataset showed that our method performs better and runs faster than the state-of-the-art predictors. PMID:28744468

  5. Liver Cirrhosis: Evaluation, Nutritional Status, and Prognosis

    PubMed Central

    Nishikawa, Hiroki; Osaki, Yukio

    2015-01-01

    The liver is the major organ for the metabolism of three major nutrients: protein, fat, and carbohydrate. Chronic hepatitis C virus infection is the major cause of chronic liver disease. Liver cirrhosis (LC) results from different mechanisms of liver injury that lead to necroinflammation and fibrosis. LC has been seen to be not a single disease entity but one that can be graded into distinct clinical stages related to clinical outcome. Several noninvasive methods have been developed for assessing liver fibrosis and these methods have been used for predicting prognosis in patients with LC. On the other hand, subjects with LC often have protein-energy malnutrition (PEM) and poor physical activity. These conditions often result in sarcopenia, which is the loss of skeletal muscle volume and increased muscle weakness. Recent studies have demonstrated that PEM and sarcopenia are predictive factors for poorer survival in patients with LC. Based on these backgrounds, several methods for evaluating nutritional status in patients with chronic liver disease have been developed and they have been preferably used in the clinical field practice. In this review, we will summarize the current knowledge in the field of LC from the viewpoints of diagnostic method, nutritional status, and clinical outcomes. PMID:26494949

  6. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  7. Predictors of Relapse in Patients with Organizing Pneumonia

    PubMed Central

    Kim, Minjung; Seo, Hyewon; Shin, Kyung-Min; Lim, Jae-Kwang; Kim, Hyera; Yoo, Seung-Soo; Lee, Jaehee; Lee, Shin-Yup; Kim, Chang-Ho; Park, Jae-Yong

    2015-01-01

    Background Although organizing pneumonia (OP) responds well to corticosteroid therapy, relapse is common during dose reduction or follow-up. Predictors of relapse in OP patients remain to be established. The aim of the present study was to identify factors related to relapse in OP patients. Methods This study was retrospectively performed in a tertiary referral center. Of 66 OP patients who were improved with or without treatment, 20 (30%) experienced relapse. The clinical and radiologic parameters in the relapse patient group (n=20) were compared to that in the non-relapse group (n=46). Results Multivariate analysis demonstrated that percent predicted forced vital capacity (FVC), PaO2/FiO2, and serum protein level were significant predictors of relapse in OP patients (odds ratio [OR], 0.82; 95% confidence interval [CI], 0.70-0.97; p=0.018; OR, 1.02; 95% CI, 1.00-1.04; p=0.042; and OR, 0.06; 95% CI, 0.01-0.87; p=0.039, respectively). Conclusion This study shows that FVC, PaO2/FiO2 and serum protein level at presentation can significantly predict relapse in OP patients. PMID:26175771

  8. The Smad4/PTEN Expression Pattern Predicts Clinical Outcomes in Colorectal Adenocarcinoma

    PubMed Central

    Chung, Yumin; Wi, Young Chan; Kim, Yeseul; Bang, Seong Sik; Yang, Jung-Ho; Jang, Kiseok; Min, Kyueng-Whan; Paik, Seung Sam

    2018-01-01

    Background Smad4 and PTEN are prognostic indicators for various tumor types. Smad4 regulates tumor suppression, whereas PTEN inhibits cell proliferation. We analyzed and compared the performance of Smad4 and PTEN for predicting the prognosis of patients with colorectal adenocarcinoma. Methods Combined expression patterns based on Smad4+/– and PTEN+/– status were evaluated by immunostaining using a tissue microarray of colorectal adenocarcinoma. The relationships between the protein expression and clinicopathological variables were analyzed. Results Smad4–/PTEN– status was most frequently observed in metastatic adenocarcinoma, followed by primary adenocarcinoma and tubular adenoma (p<.001). When Smad4–/PTEN– and Smad4+/PTEN+ groups were compared, Smad4–/PTEN– status was associated with high N stage (p=.018) and defective mismatch repair proteins (p=.006). Significant differences in diseasefree survival and overall survival were observed among the three groups (Smad4+/PTEN+, Smad4–/PTEN+ or Smad4+/PTEN–, and Smad4–/PTEN–) (all p<.05). Conclusions Concurrent loss of Smad4 and PTEN may lead to more aggressive disease and poor prognosis in patients with colorectal adenocarcinoma compared to the loss of Smad4 or PTEN alone. PMID:29056035

  9. Characterization and Prediction of the SPI Background

    NASA Technical Reports Server (NTRS)

    Teegarden, B. J.; Jean, P.; Knodlseder, J.; Skinner, G. K.; Weidenspointer, G.

    2003-01-01

    The INTEGRAL Spectrometer, like most gamma-ray instruments, is background dominated. Signal-to-background ratios of a few percent are typical. The background is primarily due to interactions of cosmic rays in the instrument and spacecraft. It characteristically varies by +/- 5% on time scales of days. This variation is caused mainly by fluctuations in the interplanetary magnetic field that modulates the cosmic ray intensity. To achieve the maximum performance from SPI it is essential to have a high quality model of this background that can predict its value to a fraction of a percent. In this poster we characterize the background and its variability, explore various models, and evaluate the accuracy of their predictions.

  10. Hepatoerythropoietic porphyria due to a novel mutation in the uroporphyrinogen decarboxylase gene

    PubMed Central

    To-Figueras, J.; Phillips, J.; Gonzalez-López, J.M.; Badenas, C.; Madrigal, I.; González-Romarís, E.M.; Ramos, C.; Aguirre, J.M.; Herrero, C.

    2013-01-01

    Summary Background Hepatoerythropoietic porphyria (HEP) is a rare form of porphyria that results from a deficiency of uroporphyrinogen decarboxylase (UROD). The disease is caused by homoallelism or heteroallelism for mutations in the UROD gene. Objective To study a 19 year-old woman from Equatorial Guinea, one of the few cases of HEP of African descent and to characterize a new mutation causing HEP. Methods Excretion of porphyrins and residual UROD activity in erythrocytes were measured and compared to other HEP patients. UROD gene of the proband was sequenced and a new mutation identified. The recombinant UROD protein was purified and assayed for enzymatic activity. The aminoacid change mapped to the UROD protein and the functional consequences were predicted. Results The patient presented a novel G170D missense mutation in homozygosity. Porphyrin excretion showed an atypical pattern in stool with a high pentaporphyrin III to isocoproporphyrin ratio. Erythrocyte UROD activity was 42 % of normal and higher than the activity found in HEP patients with a G281E mutation. The recombinant UROD protein showed a relative activity of 17 % and 60 % of wild-type towards uroporphyrinogen I and III respectively. Molecular modelling showed that glycine 170 is located on the dimer interface of UROD, in a loop containing residues 167-172 that are critical for optimal enzymatic activity and that carboxyl side chain from aspartic acid is predicted to cause negative interactions between the protein and the substrate. Conclusions The results emphasize the complex relationship between the genetic defects and the biochemical phenotype in homozygous porphyria. PMID:21668429

  11. Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.

    PubMed

    Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka

    Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.

  12. Quality assessment of protein model-structures based on structural and functional similarities

    PubMed Central

    2012-01-01

    Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498

  13. Predicting the helix packing of globular proteins by self-correcting distance geometry.

    PubMed

    Mumenthaler, C; Braun, W

    1995-05-01

    A new self-correcting distance geometry method for predicting the three-dimensional structure of small globular proteins was assessed with a test set of 8 helical proteins. With the knowledge of the amino acid sequence and the helical segments, our completely automated method calculated the correct backbone topology of six proteins. The accuracy of the predicted structures ranged from 2.3 A to 3.1 A for the helical segments compared to the experimentally determined structures. For two proteins, the predicted constraints were not restrictive enough to yield a conclusive prediction. The method can be applied to all small globular proteins, provided the secondary structure is known from NMR analysis or can be predicted with high reliability.

  14. Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling.

    PubMed

    Wang, Juexin; Luttrell, Joseph; Zhang, Ning; Khan, Saad; Shi, NianQing; Wang, Michael X; Kang, Jing-Qiong; Wang, Zheng; Xu, Dong

    2016-01-01

    Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.

  15. Parafoveal Target Detectability Reversal Predicted by Local Luminance and Contrast Gain Control

    NASA Technical Reports Server (NTRS)

    Ahumada, Albert J., Jr.; Beard, Bettina L.; Null, Cynthia H. (Technical Monitor)

    1996-01-01

    This project is part of a program to develop image discrimination models for the prediction of the detectability of objects in a range of backgrounds. We wanted to see if the models could predict parafoveal object detection as well as they predict detection in foveal vision. We also wanted to make our simplified models more general by local computation of luminance and contrast gain control. A signal image (0.78 x 0.17 deg) was made by subtracting a simulated airport runway scene background image (2.7 deg square) from the same scene containing an obstructing aircraft. Signal visibility contrast thresholds were measured in a fully crossed factorial design with three factors: eccentricity (0 deg or 4 deg), background (uniform or runway scene background), and fixed-pattern white noise contrast (0%, 5%, or 10%). Three experienced observers responded to three repetitions of 60 2IFC trials in each condition and thresholds were estimated by maximum likelihood probit analysis. In the fovea the average detection contrast threshold was 4 dB lower for the runway background than for the uniform background, but in the parafovea, the average threshold was 6 dB higher for the runway background than for the uniform background. This interaction was similar across the different noise levels and for all three observers. A likely reason for the runway background giving a lower threshold in the fovea is the low luminance near the signal in that scene. In our model, the local luminance computation is controlled by a spatial spread parameter. When this parameter and a corresponding parameter for the spatial spread of contrast gain were increased for the parafoveal predictions, the model predicts the interaction of background with eccentricity.

  16. Plasma protein absolute quantification by nano-LC Q-TOF UDMSE for clinical biomarker verification

    PubMed Central

    ILIES, MARIA; IUGA, CRISTINA ADELA; LOGHIN, FELICIA; DHOPLE, VISHNU MUKUND; HAMMER, ELKE

    2017-01-01

    Background and aims Proteome-based biomarker studies are targeting proteins that could serve as diagnostic, prognosis, and prediction molecules. In the clinical routine, immunoassays are currently used for the absolute quantification of such biomarkers, with the major limitation that only one molecule can be targeted per assay. The aim of our study was to test a mass spectrometry based absolute quantification method for the verification of plasma protein sets which might serve as reliable biomarker panels for the clinical practice. Methods Six EDTA plasma samples were analyzed after tryptic digestion using a high throughput data independent acquisition nano-LC Q-TOF UDMSE proteomics approach. Synthetic Escherichia coli standard peptides were spiked in each sample for the absolute quantification. Data analysis was performed using ProgenesisQI v2.0 software (Waters Corporation). Results Our method ensured absolute quantification of 242 non redundant plasma proteins in a single run analysis. The dynamic range covered was 105. 86% were represented by classical plasma proteins. The overall median coefficient of variation was 0.36, while a set of 63 proteins was found to be highly stable. Absolute protein concentrations strongly correlated with values reviewed in the literature. Conclusions Nano-LC Q-TOF UDMSE proteomic analysis can be used for a simple and rapid determination of absolute amounts of plasma proteins. A large number of plasma proteins could be analyzed, while a wide dynamic range was covered with low coefficient of variation at protein level. The method proved to be a reliable tool for the quantification of protein panel for biomarker verification in the clinical practice. PMID:29151793

  17. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  18. Contribution to the Prediction of the Fold Code: Application to Immunoglobulin and Flavodoxin Cases

    PubMed Central

    Banach, Mateusz; Prudhomme, Nicolas; Carpentier, Mathilde; Duprat, Elodie; Papandreou, Nikolaos; Kalinowska, Barbara; Chomilier, Jacques; Roterman, Irena

    2015-01-01

    Background Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model. Results We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus. PMID:25915049

  19. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data.

    PubMed

    Braaksma, Machtelt; Martens-Uzunova, Elena S; Punt, Peter J; Schaap, Peter J

    2010-10-19

    The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.

  20. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  1. A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein.

    PubMed

    Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L

    2014-01-21

    Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  2. Computational prediction of host-pathogen protein-protein interactions.

    PubMed

    Dyer, Matthew D; Murali, T M; Sobral, Bruno W

    2007-07-01

    Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins. Developing computational methods that identify which PPIs enable a pathogen to infect a host has great implications in identifying potential targets for therapeutics. We present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins. Given a set of intra-species PPIs, we identify the functional domains in each of the interacting proteins. For every pair of functional domains, we use Bayesian statistics to assess the probability that two proteins with that pair of domains will interact. We apply our method to the Homo sapiens-Plasmodium falciparum host-pathogen system. Our system predicts 516 PPIs between proteins from these two organisms. We show that pairs of human proteins we predict to interact with the same Plasmodium protein are close to each other in the human PPI network and that Plasmodium pairs predicted to interact with same human protein are co-expressed in DNA microarray datasets measured during various stages of the Plasmodium life cycle. Finally, we identify functionally enriched sub-networks spanned by the predicted interactions and discuss the plausibility of our predictions. Supplementary data are available at http://staff.vbi.vt.edu/dyermd/publications/dyer2007a.html. Supplementary data are available at Bioinformatics online.

  3. Inflammation and hemostasis biomarkers for predicting stroke in postmenopausal women: The Women’s Health Initiative Observational Study

    PubMed Central

    Kaplan, Robert C; McGinn, Aileen P; Baird, Alison E; Hendrix, Susan L; Kooperberg, Charles; Lynch, John; Rosenbaum, Daniel M; Johnson, Karen C; Strickler, Howard D; Wassertheil-Smoller, Sylvia

    2009-01-01

    Background Inflammatory and hemostasis-related biomarkers may identify women at risk of stroke. Methods Hormones and Biomarkers Predicting Stroke is a study of ischemic stroke among postmenopausal women participating in the Women’s Health Initiative Observational Study (n = 972 case-control pairs). A Biomarker Risk Score was derived from levels of seven inflammatory and hemostasis-related biomarkers that appeared individually to predict risk of ischemic stroke: C-reactive protein, interleukin-6, tissue plasminogen activator, D-dimer, white blood cell count, neopterin, and homocysteine. The c index was used to evaluate discrimination. Results Of all the individual biomarkers examined, C-reactive protein emerged as the only independent single predictor of ischemic stroke (adjusted odds ratio comparing Q4 versus Q1 = 1.64, 95% confidence interval: 1.15–2.32, p = 0.01) after adjustment for other biomarkers and standard stroke risk factors. The Biomarker Risk Score identified a gradient of increasing stroke risk with a greater number of elevated inflammatory/hemostasis biomarkers, and improved the c index significantly compared with standard stroke risk factors (p = 0.02). Among the subset of individuals who met current criteria for “high risk” levels of C-reactive protein (> 3.0 mg/L), the Biomarker Risk Score defined an approximately two-fold gradient of risk. We found no evidence for a relationship between stroke and levels of E-selectin, fibrinogen, tumor necrosis factor-alpha, vascular cell adhesion molecule-1, prothrombin fragment 1+2, Factor VIIC, or plasminogen activator inhibitor-1 antigen (p >0.15). Discussion The findings support the further exploration of multiple-biomarker panels to develop approaches for stratifying an individual’s risk of stroke. PMID:18984425

  4. Benchmarking protein-protein interface predictions: why you should care about protein size.

    PubMed

    Martin, Juliette

    2014-07-01

    A number of predictive methods have been developed to predict protein-protein binding sites. Each new method is traditionally benchmarked using sets of protein structures of various sizes, and global statistics are used to assess the quality of the prediction. Little attention has been paid to the potential bias due to protein size on these statistics. Indeed, small proteins involve proportionally more residues at interfaces than large ones. If a predictive method is biased toward small proteins, this can lead to an over-estimation of its performance. Here, we investigate the bias due to the size effect when benchmarking protein-protein interface prediction on the widely used docking benchmark 4.0. First, we simulate random scores that favor small proteins over large ones. Instead of the 0.5 AUC (Area Under the Curve) value expected by chance, these biased scores result in an AUC equal to 0.6 using hypergeometric distributions, and up to 0.65 using constant scores. We then use real prediction results to illustrate how to detect the size bias by shuffling, and subsequently correct it using a simple conversion of the scores into normalized ranks. In addition, we investigate the scores produced by eight published methods and show that they are all affected by the size effect, which can change their relative ranking. The size effect also has an impact on linear combination scores by modifying the relative contributions of each method. In the future, systematic corrections should be applied when benchmarking predictive methods using data sets with mixed protein sizes. © 2014 Wiley Periodicals, Inc.

  5. Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions

    PubMed Central

    Laine, Elodie; Carbone, Alessandra

    2015-01-01

    Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684

  6. Accurate Prediction of Contact Numbers for Multi-Spanning Helical Membrane Proteins

    PubMed Central

    Li, Bian; Mendenhall, Jeffrey; Nguyen, Elizabeth Dong; Weiner, Brian E.; Fischer, Axel W.; Meiler, Jens

    2017-01-01

    Prediction of the three-dimensional (3D) structures of proteins by computational methods is acknowledged as an unsolved problem. Accurate prediction of important structural characteristics such as contact number is expected to accelerate the otherwise slow progress being made in the prediction of 3D structure of proteins. Here, we present a dropout neural network-based method, TMH-Expo, for predicting the contact number of transmembrane helix (TMH) residues from sequence. Neuronal dropout is a strategy where certain neurons of the network are excluded from back-propagation to prevent co-adaptation of hidden-layer neurons. By using neuronal dropout, overfitting was significantly reduced and performance was noticeably improved. For multi-spanning helical membrane proteins, TMH-Expo achieved a remarkable Pearson correlation coefficient of 0.69 between predicted and experimental values and a mean absolute error of only 1.68. In addition, among those membrane protein–membrane protein interface residues, 76.8% were correctly predicted. Mapping of predicted contact numbers onto structures indicates that contact numbers predicted by TMH-Expo reflect the exposure patterns of TMHs and reveal membrane protein–membrane protein interfaces, reinforcing the potential of predicted contact numbers to be used as restraints for 3D structure prediction and protein–protein docking. TMH-Expo can be accessed via a Web server at www.meilerlab.org. PMID:26804342

  7. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

    PubMed Central

    2012-01-01

    Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852

  8. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  9. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  10. PPCM: Combing multiple classifiers to improve protein-protein interaction prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-08-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less

  11. Gene Expression Changes in Phosphorus Deficient Potato (Solanum tuberosum L.) Leaves and the Potential for Diagnostic Gene Expression Markers

    PubMed Central

    Hammond, John P.; Broadley, Martin R.; Bowen, Helen C.; Spracklen, William P.; Hayden, Rory M.; White, Philip J.

    2011-01-01

    Background There are compelling economic and environmental reasons to reduce our reliance on inorganic phosphate (Pi) fertilisers. Better management of Pi fertiliser applications is one option to improve the efficiency of Pi fertiliser use, whilst maintaining crop yields. Application rates of Pi fertilisers are traditionally determined from analyses of soil or plant tissues. Alternatively, diagnostic genes with altered expression under Pi limiting conditions that suggest a physiological requirement for Pi fertilisation, could be used to manage Pifertiliser applications, and might be more precise than indirect measurements of soil or tissue samples. Results We grew potato (Solanum tuberosum L.) plants hydroponically, under glasshouse conditions, to control their nutrient status accurately. Samples of total leaf RNA taken periodically after Pi was removed from the nutrient solution were labelled and hybridised to potato oligonucleotide arrays. A total of 1,659 genes were significantly differentially expressed following Pi withdrawal. These included genes that encode proteins involved in lipid, protein, and carbohydrate metabolism, characteristic of Pi deficient leaves and included potential novel roles for genes encoding patatin like proteins in potatoes. The array data were analysed using a support vector machine algorithm to identify groups of genes that could predict the Pi status of the crop. These groups of diagnostic genes were tested using field grown potatoes that had either been fertilised or unfertilised. A group of 200 genes could correctly predict the Pi status of field grown potatoes. Conclusions This paper provides a proof-of-concept demonstration for using microarrays and class prediction tools to predict the Pi status of a field grown potato crop. There is potential to develop this technology for other biotic and abiotic stresses in field grown crops. Ultimately, a better understanding of crop stresses may improve our management of the crop, improving the sustainability of agriculture. PMID:21935429

  12. Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes

    PubMed Central

    2012-01-01

    Background The metabolic capacity for nitrogen fixation is known to be present in several prokaryotic species scattered across taxonomic groups. Experimental detection of nitrogen fixation in microbes requires species-specific conditions, making it difficult to obtain a comprehensive census of this trait. The recent and rapid increase in the availability of microbial genome sequences affords novel opportunities to re-examine the occurrence and distribution of nitrogen fixation genes. The current practice for computational prediction of nitrogen fixation is to use the presence of the nifH and/or nifD genes. Results Based on a careful comparison of the repertoire of nitrogen fixation genes in known diazotroph species we propose a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six genes coding for structural and biosynthetic components, namely NifHDK and NifENB. Using this criterion, we conducted a comprehensive search in fully sequenced genomes and identified 149 diazotrophic species, including 82 known diazotrophs and 67 species not known to fix nitrogen. The taxonomic distribution of nitrogen fixation in Archaea was limited to the Euryarchaeota phylum; within the Bacteria domain we predict that nitrogen fixation occurs in 13 different phyla. Of these, seven phyla had not hitherto been known to contain species capable of nitrogen fixation. Our analyses also identified protein sequences that are similar to nitrogenase in organisms that do not meet the minimum-gene-set criteria. The existence of nitrogenase-like proteins lacking conserved co-factor ligands in both diazotrophs and non-diazotrophs suggests their potential for performing other, as yet unidentified, metabolic functions. Conclusions Our predictions expand the known phylogenetic diversity of nitrogen fixation, and suggest that this trait may be much more common in nature than it is currently thought. The diverse phylogenetic distribution of nitrogenase-like proteins indicates potential new roles for anciently duplicated and divergent members of this group of enzymes. PMID:22554235

  13. AIR SCORE ASSESSMENT FOR ACUTE APPENDICITIS

    PubMed Central

    VON-MÜHLEN, Bruno; FRANZON, Orli; BEDUSCHI, Murilo Gamba; KRUEL, Nicolau; LUPSELO, Daniel

    2015-01-01

    Background: Acute appendicitis is the most common cause of acute abdomen. Approximately 7% of the population will be affected by this condition during full life. The development of AIR score may contribute to diagnosis associating easy clinical criteria and two simple laboratory tests. Aim: To evaluate the score AIR (Appendicitis Inflammatory Response score) as a tool for the diagnosis and prediction of severity of acute appendicitis. Method: Were evaluated all patients undergoing surgical appendectomy. From 273 patients, 126 were excluded due to exclusion criteria. All patients were submitted o AIR score. Results: The value of the C-reactive protein and the percentage of leukocytes segmented blood count showed a direct relationship with the phase of acute appendicitis. Conclusion: As for the laboratory criteria, serum C-reactive protein and assessment of the percentage of the polymorphonuclear leukocytes count were important to diagnosis and disease stratification. PMID:26537139

  14. The Lack of Utility of Circulating Biomarkers of Inflammation and Endothelial Dysfunction for Type 2 Diabetes Risk Prediction Among Postmenopausal Women

    PubMed Central

    Chao, Chun; Song, Yiqing; Cook, Nancy; Tseng, Chi-Hong; Manson, JoAnn E.; Eaton, Charles; Margolis, Karen L.; Rodriguez, Beatriz; Phillips, Lawrence S.; Tinker, Lesley F.; Liu, Simin

    2011-01-01

    Background Recent studies have linked plasma markers of inflammation and endothelial dysfunction to type 2 diabetes mellitus (DM) development. However, the utility of these novel biomarkers for type 2 DM risk prediction remains uncertain. Methods The Women’s Health Initiative Observational Study (WHIOS), a prospective cohort, and a nested case-control study within the WHIOS of 1584 incident type 2 DM cases and 2198 matched controls were used to evaluate the utility of plasma markers of inflammation and endothelial dysfunction for type 2 DM risk prediction. Between September 1994 and December 1998, 93 676 women aged 50 to 79 years were enrolled in the WHIOS. Fasting plasma levels of glucose, insulin, white blood cells, tumor necrosis factor receptor 2, interleukin 6, high-sensitivity C-reactive protein, E-selectin, soluble intercellular adhesion molecule 1, and vascular cell adhesion molecule 1 were measured using blood samples collected at baseline. A series of prediction models including traditional risk factors and novel plasma markers were evaluated on the basis of global model fit, model discrimination, net reclassification improvement, and positive and negative predictive values. Results Although white blood cell count and levels of interleukin 6, high-sensitivity C-reactive protein, and soluble intercellular adhesion molecule 1 significantly enhanced model fit, none of the inflammatory and endothelial dysfunction markers improved the ability of model discrimination (area under the receiver operating characteristic curve, 0.93 vs 0.93), net reclassification, or predictive values (positive, 0.22 vs 0.24; negative, 0.99 vs 0.99 [using 15% 6-year type 2 DM risk as the cutoff]) compared with traditional risk factors. Similar results were obtained in ethnic-specific analyses. Conclusion Beyond traditional risk factors, measurement of plasma markers of systemic inflammation and endothelial dysfunction contribute relatively little additional value in clinical type 2 DM risk prediction in a multiethnic cohort of postmenopausal women. PMID:20876407

  15. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  16. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-10-01

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  17. Reductive evolution and the loss of PDC/PAS domains from the genus Staphylococcus

    PubMed Central

    2013-01-01

    Background The Per-Arnt-Sim (PAS) domain represents a ubiquitous structural fold that is involved in bacterial sensing and adaptation systems, including several virulence related functions. Although PAS domains and the subclass of PhoQ-DcuS-CitA (PDC) domains have a common structure, there is limited amino acid sequence similarity. To gain greater insight into the evolution of PDC/PAS domains present in the bacterial kingdom and staphylococci in specific, the PDC/PAS domains from the genomic sequences of 48 bacteria, representing 5 phyla, were identified using the sensitive search method based on HMM-to-HMM comparisons (HHblits). Results A total of 1,007 PAS domains and 686 PDC domains distributed over 1,174 proteins were identified. For 28 Gram-positive bacteria, the distribution, organization, and molecular evolution of PDC/PAS domains were analyzed in greater detail, with a special emphasis on the genus Staphylococcus. Compared to other bacteria the staphylococci have relatively fewer proteins (6–9) containing PDC/PAS domains. As a general rule, the staphylococcal genomes examined in this study contain a core group of seven PDC/PAS domain-containing proteins consisting of WalK, SrrB, PhoR, ArlS, HssS, NreB, and GdpP. The exceptions to this rule are: 1) S. saprophyticus lacks the core NreB protein; 2) S. carnosus has two additional PAS domain containing proteins; 3) S. epidermidis, S. aureus, and S. pseudintermedius have an additional protein with two PDC domains that is predicted to code for a sensor histidine kinase; 4) S. lugdunensis has an additional PDC containing protein predicted to be a sensor histidine kinase. Conclusions This comprehensive analysis demonstrates that variation in PDC/PAS domains among bacteria has limited correlations to the genome size or pathogenicity; however, our analysis established that bacteria having a motile phase in their life cycle have significantly more PDC/PAS-containing proteins. In addition, our analysis revealed a tremendous amount of variation in the number of PDC/PAS-containing proteins within genera. This variation extended to the Staphylococcus genus, which had between 6 and 9 PDC/PAS proteins and some of these appear to be previously undescribed signaling proteins. This latter point is important because most staphylococcal proteins that contain PDC/PAS domains regulate virulence factor synthesis or antibiotic resistance. PMID:23902280

  18. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  19. Protein-Protein Interface Predictions by Data-Driven Methods: A Review

    PubMed Central

    Xue, Li C; Dobbs, Drena; Bonvin, Alexandre M.J.J.; Honavar, Vasant

    2015-01-01

    Reliably pinpointing which specific amino acid residues form the interface(s) between a protein and its binding partner(s) is critical for understanding the structural and physicochemical determinants of protein recognition and binding affinity, and has wide applications in modeling and validating protein interactions predicted by high-throughput methods, in engineering proteins, and in prioritizing drug targets. Here, we review the basic concepts, principles and recent advances in computational approaches to the analysis and prediction of protein-protein interfaces. We point out caveats for objectively evaluating interface predictors, and discuss various applications of data-driven interface predictors for improving energy model-driven protein-protein docking. Finally, we stress the importance of exploiting binding partner information in reliably predicting interfaces and highlight recent advances in this emerging direction. PMID:26460190

  20. Blind predictions of protein interfaces by docking calculations in CAPRI.

    PubMed

    Lensink, Marc F; Wodak, Shoshana J

    2010-11-15

    Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent potential of addressing the interface prediction challenge. © 2010 Wiley-Liss, Inc.

  1. TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    PubMed Central

    Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya

    2012-01-01

    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565

  2. Synthesis of chlorophyll b: Localization of chlorophyllide a oxygenase and discovery of a stable radical in the catalytic subunit

    PubMed Central

    Eggink, Laura L; LoBrutto, Russell; Brune, Daniel C; Brusslan, Judy; Yamasato, Akihiro; Tanaka, Ayumi; Hoober, J Kenneth

    2004-01-01

    Background Assembly of stable light-harvesting complexes (LHCs) in the chloroplast of green algae and plants requires synthesis of chlorophyll (Chl) b, a reaction that involves oxygenation of the 7-methyl group of Chl a to a formyl group. This reaction uses molecular oxygen and is catalyzed by chlorophyllide a oxygenase (CAO). The amino acid sequence of CAO predicts mononuclear iron and Rieske iron-sulfur centers in the protein. The mechanism of synthesis of Chl b and localization of this reaction in the chloroplast are essential steps toward understanding LHC assembly. Results Fluorescence of a CAO-GFP fusion protein, transiently expressed in young pea leaves, was found at the periphery of mature chloroplasts and on thylakoid membranes by confocal fluorescence microscopy. However, when membranes from partially degreened cells of Chlamydomonas reinhardtii cw15 were resolved on sucrose gradients, full-length CAO was detected by immunoblot analysis only on the chloroplast envelope inner membrane. The electron paramagnetic resonance spectrum of CAO included a resonance at g = 4.3, assigned to the predicted mononuclear iron center. Instead of a spectrum of the predicted Rieske iron-sulfur center, a nearly symmetrical, approximately 100 Gauss peak-to-trough signal was observed at g = 2.057, with a sensitivity to temperature characteristic of an iron-sulfur center. A remarkably stable radical in the protein was revealed by an isotropic, 9 Gauss peak-to-trough signal at g = 2.0042. Fragmentation of the protein after incorporation of 125I- identified a conserved tyrosine residue (Tyr-422 in Chlamydomonas and Tyr-518 in Arabidopsis) as the radical species. The radical was quenched by chlorophyll a, an indication that it may be involved in the enzymatic reaction. Conclusion CAO was found on the chloroplast envelope and thylakoid membranes in mature chloroplasts but only on the envelope inner membrane in dark-grown C. reinhardtii cells. Such localization provides further support for the envelope membranes as the initial site of Chl b synthesis and assembly of LHCs during chloroplast development. Identification of a tyrosine radical in the protein provides insight into the mechanism of Chl b synthesis. PMID:15086960

  3. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences.

    PubMed

    Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen

    2008-10-01

    In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.

  4. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

    PubMed

    Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.

  5. Protein complex prediction in large ontology attributed protein-protein interaction networks.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo

    2013-01-01

    Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.

  6. Nutrient-Mediated Architectural Plasticity of a Predatory Trap

    PubMed Central

    Blamires, Sean J.; Tso, I-Min

    2013-01-01

    Background Nutrients such as protein may be actively sought by foraging animals. Many predators exhibit foraging plasticity, but how their foraging strategies are affected when faced with nutrient deprivation is largely unknown. In spiders, the assimilation of protein into silk may be in conflict with somatic processes so we predicted web building to be affected under protein depletion. Methodology/Principal Findings To assess the influence of protein intake on foraging plasticity we fed the orb-web spiders Argiope aemula and Cyclosa mulmeinensis high, low or no protein solutions over 10 days and allowed them to build webs. We compared post-feeding web architectural components and major ampullate (MA) silk amino acid compositions. We found that the number of radii in webs increased in both species when fed high protein solutions. Mesh size increased in A. aemula when fed a high protein solution. MA silk proline and alanine compositions varied in each species with contrasting variations in alanine between the two species. Glycine compositions only varied in C. mulmeinensis silk. No spiders significantly lost or gained mass on any feeding treatment, so they did not sacrifice somatic maintenance for amino acid investment in silk. Conclusions/Significance Our results show that the amount of protein taken in significantly affects the foraging decisions of trap-building predators, such as orb web spiders. Nevertheless, the subtle differences found between species in the association between protein intake, the amino acids invested in silk and web architectural plasticity show that the influence of protein deprivation on specific foraging strategies differs among different spiders. PMID:23349928

  7. Construction of ontology augmented networks for protein complex prediction.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  8. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    PubMed

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  9. Predicting protein-protein interactions from protein domains using a set cover approach.

    PubMed

    Huang, Chengbang; Morcos, Faruck; Kanaan, Simon P; Wuchty, Stefan; Chen, Danny Z; Izaguirre, Jesús A

    2007-01-01

    One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi.cse.nd.edu.

  10. Determinants of BH3 Binding Specificity for Mcl-1 versus Bcl-x[subscript L

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dutta, Sanjib; Gullá, Stefano; Chen, T. Scott

    2010-06-25

    Interactions among Bcl-2 family proteins are important for regulating apoptosis. Prosurvival members of the family interact with proapoptotic BH3 (Bcl-2-homology-3)-only members, inhibiting execution of cell death through the mitochondrial pathway. Structurally, this interaction is mediated by binding of the {alpha}-helical BH3 region of the proapoptotic proteins to a conserved hydrophobic groove on the prosurvival proteins. Native BH3-only proteins exhibit selectivity in binding prosurvival members, as do small molecules that block these interactions. Understanding the sequence and structural basis of interaction specificity in this family is important, as it may allow the prediction of new Bcl-2 family associations and/or the designmore » of new classes of selective inhibitors to serve as reagents or therapeutics. In this work, we used two complementary techniques - yeast surface display screening from combinatorial peptide libraries and SPOT peptide array analysis - to elucidate specificity determinants for binding to Bcl-x{sub L} versus Mcl-1, two prominent prosurvival proteins. We screened a randomized library and identified BH3 peptides that bound to either Mcl-1 or Bcl-x{sub L} selectively or to both with high affinity. The peptides competed with native ligands for binding into the conserved hydrophobic groove, as illustrated in detail by a crystal structure of a specific peptide bound to Mcl-1. Mcl-1-selective peptides from the screen were highly specific for binding Mcl-1 in preference to Bcl-x{sub L}, Bcl-2, Bcl-w, and Bfl-1, whereas Bcl-x{sub L}-selective peptides showed some cross-interaction with related proteins Bcl-2 and Bcl-w. Mutational analyses using SPOT arrays revealed the effects of 170 point mutations made in the background of a peptide derived from the BH3 region of Bim, and a simple predictive model constructed using these data explained much of the specificity observed in our Mcl-1 versus Bcl-x{sub L} binders.« less

  11. Determinants of BH3 binding specificity for Mcl-1 vs. Bcl-xL

    PubMed Central

    Dutta, Sanjib; Gullá, Stefano; Chen, T. Scott; Fire, Emiko; Grant, Robert A.; Keating, Amy E.

    2010-01-01

    Interactions among Bcl-2 family proteins are important for regulating apoptosis. Pro-survival members of the family interact with pro-apoptotic BH3-only members, inhibiting execution of cell death through the mitochondrial pathway. Structurally, this interaction is mediated by binding of the alpha-helical BH3 region of the pro-apoptotic proteins to a conserved hydrophobic groove on the pro-survival proteins. Native BH3-only proteins exhibit selectivity in binding pro-survival members, as do small molecules that block these interactions. Understanding the sequence and structural basis of interaction specificity in this family is important, as it may allow the prediction of new Bcl-2 family associations and/or the design of new classes of selective inhibitors to serve as reagents or therapeutics. In this work we used two complementary techniques, yeast surface display screening from combinatorial peptide libraries and SPOT peptide array analysis, to elucidate specificity determinants for binding to Bcl-xL vs. Mcl-1, two prominent pro-survival proteins. We screened a randomized library and identified BH3 peptides that bound to either Mcl-1 or Bcl-xL selectively, or to both with high affinity. The peptides competed with native ligands for binding into the conserved hydrophobic groove, as illustrated in detail by a crystal structure of a specific peptide bound to Mcl-1. Mcl-1 selective peptides from the screen were highly specific for binding Mcl-1 in preference to Bcl-xL, Bcl-2, Bcl-w and Bfl-1, whereas Bcl-xL selective peptides showed some cross-interaction with related proteins Bcl-2 and Bcl-w. Mutational analyses using SPOT arrays revealed the effects of 170 point mutations made in the background of a peptide derived from the BH3 region of Bim, and a simple predictive model constructed using these data explained much of the specificity observed in our Mcl-1 vs. Bcl-xL binders. PMID:20363230

  12. Predicting protein-binding RNA nucleotides with consideration of binding partners.

    PubMed

    Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook

    2015-06-01

    In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  13. Computational prediction of protein-protein interactions in Leishmania predicted proteomes.

    PubMed

    Rezende, Antonio M; Folador, Edson L; Resende, Daniela de M; Ruiz, Jeronimo C

    2012-01-01

    The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.

  14. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. A sentence sliding window approach to extract protein annotations from biomedical articles

    PubMed Central

    Krallinger, Martin; Padron, Maria; Valencia, Alfonso

    2005-01-01

    Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831

  16. Restricted N-glycan conformational space in the PDB and its implication in glycan structure modeling.

    PubMed

    Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil

    2013-01-01

    Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.

  17. Restricted N-glycan Conformational Space in the PDB and Its Implication in Glycan Structure Modeling

    PubMed Central

    Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil

    2013-01-01

    Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures. PMID:23516343

  18. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces.

    PubMed

    Venselaar, Hanka; Te Beek, Tim A H; Kuipers, Remko K P; Hekkelman, Maarten L; Vriend, Gert

    2010-11-08

    Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.

  19. Thermodynamic prediction of protein neutrality.

    PubMed

    Bloom, Jesse D; Silberg, Jonathan J; Wilke, Claus O; Drummond, D Allan; Adami, Christoph; Arnold, Frances H

    2005-01-18

    We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.

  20. Thermodynamic prediction of protein neutrality

    PubMed Central

    Bloom, Jesse D.; Silberg, Jonathan J.; Wilke, Claus O.; Drummond, D. Allan; Adami, Christoph; Arnold, Frances H.

    2005-01-01

    We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 β-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications. PMID:15644440

  1. Recovering Protein-Protein and Domain-Domain Interactions from Aggregation of IP-MS Proteomics of Coregulator Complexes

    PubMed Central

    Mazloom, Amin R.; Dannenfelser, Ruth; Clark, Neil R.; Grigoryan, Arsen V.; Linder, Kathryn M.; Cardozo, Timothy J.; Bond, Julia C.; Boran, Aislyn D. W.; Iyengar, Ravi; Malovannaya, Anna; Lanz, Rainer B.; Ma'ayan, Avi

    2011-01-01

    Coregulator proteins (CoRegs) are part of multi-protein complexes that transiently assemble with transcription factors and chromatin modifiers to regulate gene expression. In this study we analyzed data from 3,290 immuno-precipitations (IP) followed by mass spectrometry (MS) applied to human cell lines aimed at identifying CoRegs complexes. Using the semi-quantitative spectral counts, we scored binary protein-protein and domain-domain associations with several equations. Unlike previous applications, our methods scored prey-prey protein-protein interactions regardless of the baits used. We also predicted domain-domain interactions underlying predicted protein-protein interactions. The quality of predicted protein-protein and domain-domain interactions was evaluated using known binary interactions from the literature, whereas one protein-protein interaction, between STRN and CTTNBP2NL, was validated experimentally; and one domain-domain interaction, between the HEAT domain of PPP2R1A and the Pkinase domain of STK25, was validated using molecular docking simulations. The scoring schemes presented here recovered known, and predicted many new, complexes, protein-protein, and domain-domain interactions. The networks that resulted from the predictions are provided as a web-based interactive application at http://maayanlab.net/HT-IP-MS-2-PPI-DDI/. PMID:22219718

  2. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    PubMed

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  3. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests.

    PubMed

    Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A

    2005-05-24

    Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.

  4. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  5. Mechanistic and quantitative insight into cell surface targeted molecular imaging agent design.

    PubMed

    Zhang, Liang; Bhatnagar, Sumit; Deschenes, Emily; Thurber, Greg M

    2016-05-05

    Molecular imaging agent design involves simultaneously optimizing multiple probe properties. While several desired characteristics are straightforward, including high affinity and low non-specific background signal, in practice there are quantitative trade-offs between these properties. These include plasma clearance, where fast clearance lowers background signal but can reduce target uptake, and binding, where high affinity compounds sometimes suffer from lower stability or increased non-specific interactions. Further complicating probe development, many of the optimal parameters vary depending on both target tissue and imaging agent properties, making empirical approaches or previous experience difficult to translate. Here, we focus on low molecular weight compounds targeting extracellular receptors, which have some of the highest contrast values for imaging agents. We use a mechanistic approach to provide a quantitative framework for weighing trade-offs between molecules. Our results show that specific target uptake is well-described by quantitative simulations for a variety of targeting agents, whereas non-specific background signal is more difficult to predict. Two in vitro experimental methods for estimating background signal in vivo are compared - non-specific cellular uptake and plasma protein binding. Together, these data provide a quantitative method to guide probe design and focus animal work for more cost-effective and time-efficient development of molecular imaging agents.

  6. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  7. Diagnostic Accuracy of Urine Protein/Creatinine Ratio Is Influenced by Urine Concentration

    PubMed Central

    Yang, Chih-Yu; Chen, Fu-An; Chen, Chun-Fan; Liu, Wen-Sheng; Shih, Chia-Jen; Ou, Shuo-Ming; Yang, Wu-Chang; Lin, Chih-Ching; Yang, An-Hang

    2015-01-01

    Background The usage of urine protein/creatinine ratio to estimate daily urine protein excretion is prevalent, but relatively little attention has been paid to the influence of urine concentration and its impact on test accuracy. We took advantage of 24-hour urine collection to examine both urine protein/creatinine ratio (UPCR) and daily urine protein excretion, with the latter as the reference standard. Specific gravity from a concomitant urinalysis of the same urine sample was used to indicate the urine concentration. Methods During 2010 to 2014, there were 540 adequately collected 24h urine samples with protein concentration, creatinine concentration, total volume, and a concomitant urinalysis of the same sample. Variables associated with an accurate UPCR estimation were determined by multivariate linear regression analysis. Receiver operating characteristic (ROC) curves were generated to determine the discriminant cut-off values of urine creatinine concentration for predicting an accurate UPCR estimation in either dilute or concentrated urine samples. Results Our findings indicated that for dilute urine, as indicated by a low urine specific gravity, UPCR is more likely to overestimate the actual daily urine protein excretion. On the contrary, UPCR of concentrated urine is more likely to result in an underestimation. By ROC curve analysis, the best cut-off value of urine creatinine concentration for predicting overestimation by UPCR of dilute urine (specific gravity ≦ 1.005) was ≦ 38.8 mg/dL, whereas the best cut-off values of urine creatinine for predicting underestimation by UPCR of thick urine were ≧ 63.6 mg/dL (specific gravity ≧ 1.015), ≧ 62.1 mg/dL (specific gravity ≧ 1.020), ≧ 61.5 mg/dL (specific gravity ≧ 1.025), respectively. We also compared distribution patterns of urine creatinine concentration of 24h urine cohort with a concurrent spot urine cohort and found that the underestimation might be more profound in single voided samples. Conclusions The UPCR in samples with low or high specific gravity is more likely to overestimate or underestimate actual daily urine protein amount, respectively, especially in a dilute urine sample with its creatinine below 38.8 mg/dL or a concentrated sample with its creatinine above 61.5 mg/dL. In particular, UPCR results should be interpreted with caution in cases that involve dilute urine samples because its overestimation may lead to an erroneous diagnosis of proteinuric renal disease or an incorrect staging of chronic kidney disease. PMID:26353117

  8. Systematic Prediction of Scaffold Proteins Reveals New Design Principles in Scaffold-Mediated Signal Transduction

    PubMed Central

    Hu, Jianfei; Neiswinger, Johnathan; Zhang, Jin; Zhu, Heng; Qian, Jiang

    2015-01-01

    Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process. PMID:26393507

  9. Proteins and carbohydrates in nipple aspirate fluid predict the presence of atypia and cancer in women requiring diagnostic breast biopsy

    PubMed Central

    2012-01-01

    Background Herein we present the results of two related investigations. The first study determined if concentrations in breast nipple discharge (ND) of two proteins (urinary plasminogen activator, uPA and its inhibitor, PAI-1) predicted the presence of breast atypia and cancer in pre- and/or postmenopausal women requiring surgery because of a suspicious breast lesion. The second study assessed if these proteins increased the predictive ability of a carbohydrate (Thomsen Friedenreich, TF) which we previously demonstrated predicted the presence of disease in postmenopausal women requiring surgery. Methods In the first study we prospectively enrolled 79 participants from whom we collected ND, measured uPA and PAI-1 and correlated expression with pathologic findings. In the second study we analyzed 35 (uPA and PAI-1 in 24, uPA in an additional 11) ND samples collected from different participants requiring breast surgery, all of whom also had TF results. Results uPA expression was higher in pre- and PAI-1 in postmenopausal women with 1) cancer (DCIS or invasive) vs. either no cancer (atypia or benign pathology, p = .018 and .025, respectively), or benign pathology (p = .017 and .033, respectively); and 2) abnormal (atypia or cancer) versus benign pathology (p = .018 and .052, respectively). High uPA and PAI-1 concentrations and age were independent predictors of disease in premenopausal women, with an area under the curve (AUC) of 83-87% when comparing diseased vs. benign pathology. uPA, TF, and age correctly classified 35 pre- and postmenopausal women as having disease or not 84-91% of the time, whereas combining uPA+PAI-1+TF correctly classified 24 women 97-100% of the time. Conclusions uPA and PAI-1 concentrations in ND were higher in women with atypia and cancer compared to women with benign disease. Combining uPA, PAI-1 and TF in the assessment of women requiring diagnostic breast surgery maximized disease prediction. The assessment of these markers may prove useful in early breast cancer detection. PMID:22296682

  10. Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

    PubMed

    Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

    2016-01-01

    Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.

  11. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  12. Akirins in sea lice: first steps towards a deeper understanding.

    PubMed

    Carpio, Yamila; García, Claudia; Pons, Tirso; Haussmann, Denise; Rodríguez-Ramos, Tania; Basabe, Liliana; Acosta, Jannel; Estrada, Mario Pablo

    2013-10-01

    Sea lice (Copepoda, Caligidae) are the most widely distributed marine pathogens in the salmon industry. Vaccination could be an environmentally friendly alternative for sea lice control; however, research on the development of such vaccines is still at an early stage of development. Recent results have suggested that subolesin/akirin/my32 are good candidate antigens for the control of arthropod infestations, including sea lice, but background knowledge about these genes in crustaceans is limited. Herein, we characterize the my32 gene/protein from two important sea lice species, Caligus rogercresseyi and Lepeophtheirus salmonis, based on cDNA sequence isolation, phylogenetic relationships, three dimensional structure prediction and expression analysis. The results show that these genes/proteins have the main characteristics of akirins from invertebrates. In addition, immunization with purified recombinant my32 from L. salmonis elicited a specific antibody response in mice and fish. These results provide an improvement to our current knowledge about my32 proteins and their potential use as vaccine candidates against sea lice in fish. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. Assembling a Protein-Protein Interaction Map of the SSU Processome from Existing Datasets

    PubMed Central

    Baserga, Susan J.

    2011-01-01

    Background The small subunit (SSU) processome is a large ribonucleoprotein complex involved in small ribosomal subunit assembly. It consists of the U3 snoRNA and ∼72 proteins. While most of its components have been identified, the protein-protein interactions (PPIs) among them remain largely unknown, and thus the assembly, architecture and function of the SSU processome remains unclear. Methodology We queried PPI databases for SSU processome proteins to quantify the degree to which the three genome-wide high-throughput yeast two-hybrid (HT-Y2H) studies, the genome-wide protein fragment complementation assay (PCA) and the literature-curated (LC) datasets cover the SSU processome interactome. Conclusions We find that coverage of the SSU processome PPI network is remarkably sparse. Two of the three HT-Y2H studies each account for four and six PPIs between only six of the 72 proteins, while the third study accounts for as little as one PPI and two proteins. The PCA dataset has the highest coverage among the genome-wide studies with 27 PPIs between 25 proteins. The LC dataset was the most extensive, accounting for 34 proteins and 38 PPIs, many of which were validated by independent methods, thereby further increasing their reliability. When the collected data were merged, we found that at least 70% of the predicted PPIs have yet to be determined and 26 proteins (36%) have no known partners. Since the SSU processome is conserved in all Eukaryotes, we also queried HT-Y2H datasets from six additional model organisms, but only four orthologues and three previously known interologous interactions were found. This provides a starting point for further work on SSU processome assembly, and spotlights the need for a more complete genome-wide Y2H analysis. PMID:21423703

  14. Discovery-2: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria

    PubMed Central

    2013-01-01

    Background Drug resistance to anti-malarial compounds remains a serious problem, with resistance to newer pharmaceuticals developing at an alarming rate. The development of new anti-malarials remains a priority, and the rational selection of putative targets is a key element of this process. Discovery-2 is an update of the original Discovery in silico resource for the rational selection of putative drug target proteins, enabling researchers to obtain information for a protein which may be useful for the selection of putative drug targets, and to perform advanced filtering of proteins encoded by the malaria genome based on a series of molecular properties. Methods An updated in silico resource has been developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein properties used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions. Newly added features include drugability measures from ChEMBL, automated literature relations and links to clinical trial information. Searching by chemical structure is also available. Results The updated functionality of the Discovery-2 resource is presented, together with a detailed case study of the Plasmodium falciparum S-adenosyl-L-homocysteine hydrolase (PfSAHH) protein. A short example of a chemical search with pyrimethamine is also illustrated. Conclusion The updated Discovery-2 resource allows researchers to obtain detailed properties of proteins from the malaria genome, which may be of interest in the target selection process, and to perform advanced filtering and selection of proteins based on a relevant range of molecular characteristics. PMID:23537208

  15. Phylogenetic distribution and expression of a penicillin-binding protein homologue, Ear and its significance in virulence of Staphylococcus aureus.

    PubMed

    Singh, Vineet K; Ring, Robert P; Aswani, Vijay; Stemper, Mary E; Kislow, Jennifer; Ye, Zhan; Shukla, Sanjay K

    2017-12-01

    Staphylococcus aureus is an opportunistic human pathogen that can cause serious infections in humans. A plethora of known and putative virulence factors are produced by staphylococci that collectively orchestrate pathogenesis. Ear protein (Escherichia coli ampicillin resistance) in S. aureus is an exoprotein in COL strain, predicted to be a superantigen, and speculated to play roles in antibiotic resistance and virulence. The goal of this study was to determine if expression of ear is modulated by single nucleotide polymorphisms in its promoter and coding sequences and whether this gene plays roles in antibiotic resistance and virulence. Promoter, coding sequences and expression of the ear gene in clinical and carriage S. aureus strains with distinct genetic backgrounds were analysed. The JE2 strain and its isogenic ear mutant were used in a systemic infection mouse model to determine the competiveness of the ear mutant.Results/Key findings. The ear gene showed a variable expression, with USA300FPR3757 showing a high-level expression compared to many of the other strains tested including some showing negligible expression. Higher expression was associated with agr type 1 but not correlated with phylogenetic relatedness of the ear gene based upon single nucleotide polymorphisms in the promoter or coding regions suggesting a complex regulation. An isogenic JE2 (USA300 background) ear mutant showed no significant difference in its growth, antibiotic susceptibility or virulence in a mouse model. Our data suggests that despite being highly expressed in a USA300 genetic background, Ear is not a significant contributor to virulence in that strain.

  16. An ensemble framework for identifying essential proteins.

    PubMed

    Zhang, Xue; Xiao, Wangxin; Acencio, Marcio Luis; Lemke, Ney; Wang, Xujing

    2016-08-25

    Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins. This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end.

  17. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  18. ProTSAV: A protein tertiary structure analysis and validation server.

    PubMed

    Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B

    2016-01-01

    Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome.

    PubMed

    Li, Yiwei; Ilie, Lucian

    2017-11-15

    Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/ .

  20. Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising.

    PubMed

    Yu, Bin; Li, Shan; Qiu, Wen-Ying; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Wang, Ming-Hui; Zhang, Yan

    2017-12-08

    Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.

  1. Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoising

    PubMed Central

    Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Wang, Ming-Hui; Zhang, Yan

    2017-01-01

    Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research. PMID:29296195

  2. A cyber-linked undergraduate research experience in computational biomolecular structure prediction and design

    PubMed Central

    Alford, Rebecca F.; Dolan, Erin L.

    2017-01-01

    Computational biology is an interdisciplinary field, and many computational biology research projects involve distributed teams of scientists. To accomplish their work, these teams must overcome both disciplinary and geographic barriers. Introducing new training paradigms is one way to facilitate research progress in computational biology. Here, we describe a new undergraduate program in biomolecular structure prediction and design in which students conduct research at labs located at geographically-distributed institutions while remaining connected through an online community. This 10-week summer program begins with one week of training on computational biology methods development, transitions to eight weeks of research, and culminates in one week at the Rosetta annual conference. To date, two cohorts of students have participated, tackling research topics including vaccine design, enzyme design, protein-based materials, glycoprotein modeling, crowd-sourced science, RNA processing, hydrogen bond networks, and amyloid formation. Students in the program report outcomes comparable to students who participate in similar in-person programs. These outcomes include the development of a sense of community and increases in their scientific self-efficacy, scientific identity, and science values, all predictors of continuing in a science research career. Furthermore, the program attracted students from diverse backgrounds, which demonstrates the potential of this approach to broaden the participation of young scientists from backgrounds traditionally underrepresented in computational biology. PMID:29216185

  3. A cyber-linked undergraduate research experience in computational biomolecular structure prediction and design.

    PubMed

    Alford, Rebecca F; Leaver-Fay, Andrew; Gonzales, Lynda; Dolan, Erin L; Gray, Jeffrey J

    2017-12-01

    Computational biology is an interdisciplinary field, and many computational biology research projects involve distributed teams of scientists. To accomplish their work, these teams must overcome both disciplinary and geographic barriers. Introducing new training paradigms is one way to facilitate research progress in computational biology. Here, we describe a new undergraduate program in biomolecular structure prediction and design in which students conduct research at labs located at geographically-distributed institutions while remaining connected through an online community. This 10-week summer program begins with one week of training on computational biology methods development, transitions to eight weeks of research, and culminates in one week at the Rosetta annual conference. To date, two cohorts of students have participated, tackling research topics including vaccine design, enzyme design, protein-based materials, glycoprotein modeling, crowd-sourced science, RNA processing, hydrogen bond networks, and amyloid formation. Students in the program report outcomes comparable to students who participate in similar in-person programs. These outcomes include the development of a sense of community and increases in their scientific self-efficacy, scientific identity, and science values, all predictors of continuing in a science research career. Furthermore, the program attracted students from diverse backgrounds, which demonstrates the potential of this approach to broaden the participation of young scientists from backgrounds traditionally underrepresented in computational biology.

  4. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  5. PreSSAPro: a software for the prediction of secondary structure by amino acid properties.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-10-01

    PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.

  6. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  7. Immunogenicity of therapeutics: a matter of efficacy and safety.

    PubMed

    Nechansky, Andreas; Kircheis, Ralf

    2010-11-01

    The unwanted immunogenicity of therapeutic proteins is a major concern regarding patient safety. Furthermore, pharmacokinetic, pharmacodynamic and clinical efficacy can be seriously affected by the immunogenicity of therapeutic proteins. Authorities have fully recognized this issue and demand appropriate and well-characterized assays to detect anti-drug antibodies (ADAs). We provide an overview of the immunogenicity topic in general, the regulatory background and insight into underlying immunological mechanisms and the limited ability to predict clinical immunogenicity a priori. Furthermore, we comment on the analytical testing approach and the status-quo of appropriate method validation. The review provides insight regarding the analytical approach that is expected by regulatory authorities overseeing immunogenicity testing requirements. Additionally, the factors influencing immunogenicity are summarized and key references regarding immunogenicity testing approaches and method validation are discussed. The unwanted immunogenicity of protein therapeutics is of major concern because of its potential to affect patient safety and drug efficacy. Analytical testing is sophisticated and requires more than one assay. Because immunogenicity in humans is hardly predictable, assay development has to start in a timely fashion and for clinical studies immunogenicity assay validation is mandatory prior to analyzing patient serum samples. Regarding ADAs, the question remains as to when such antibodies are regarded of clinical relevance and what levels are, if at all, acceptable. In summary, the detection of ADAs should raise the awareness of the physician concerning patient safety and of the sponsor/manufacture concerning the immunogenic potential of the drug product.

  8. Computational prediction of protein interactions related to the invasion of erythrocytes by malarial parasites.

    PubMed

    Liu, Xuewu; Huang, Yuxiao; Liang, Jiao; Zhang, Shuai; Li, Yinghui; Wang, Jun; Shen, Yan; Xu, Zhikai; Zhao, Ya

    2014-11-30

    The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs. In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites. We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations.

  9. A chronic fatigue syndrome – related proteome in human cerebrospinal fluid

    PubMed Central

    Baraniuk, James N; Casado, Begona; Maibach, Hilda; Clauw, Daniel J; Pannell, Lewis K; Hess S, Sonja

    2005-01-01

    Background Chronic Fatigue Syndrome (CFS), Persian Gulf War Illness (PGI), and fibromyalgia are overlapping symptom complexes without objective markers or known pathophysiology. Neurological dysfunction is common. We assessed cerebrospinal fluid to find proteins that were differentially expressed in this CFS-spectrum of illnesses compared to control subjects. Methods Cerebrospinal fluid specimens from 10 CFS, 10 PGI, and 10 control subjects (50 μl/subject) were pooled into one sample per group (cohort 1). Cohort 2 of 12 control and 9 CFS subjects had their fluids (200 μl/subject) assessed individually. After trypsin digestion, peptides were analyzed by capillary chromatography, quadrupole-time-of-flight mass spectrometry, peptide sequencing, bioinformatic protein identification, and statistical analysis. Results Pooled CFS and PGI samples shared 20 proteins that were not detectable in the pooled control sample (cohort 1 CFS-related proteome). Multilogistic regression analysis (GLM) of cohort 2 detected 10 proteins that were shared by CFS individuals and the cohort 1 CFS-related proteome, but were not detected in control samples. Detection of ≥1 of a select set of 5 CFS-related proteins predicted CFS status with 80% concordance (logistic model). The proteins were α-1-macroglobulin, amyloid precursor-like protein 1, keratin 16, orosomucoid 2 and pigment epithelium-derived factor. Overall, 62 of 115 proteins were newly described. Conclusion This pilot study detected an identical set of central nervous system, innate immune and amyloidogenic proteins in cerebrospinal fluids from two independent cohorts of subjects with overlapping CFS, PGI and fibromyalgia. Although syndrome names and definitions were different, the proteome and presumed pathological mechanism(s) may be shared. PMID:16321154

  10. MDcons: Intermolecular contact maps as a tool to analyze the interface of protein complexes from molecular dynamics trajectories

    PubMed Central

    2014-01-01

    Background Molecular Dynamics (MD) simulations of protein complexes suffer from the lack of specific tools in the analysis step. Analyses of MD trajectories of protein complexes indeed generally rely on classical measures, such as the RMSD, RMSF and gyration radius, conceived and developed for single macromolecules. As a matter of fact, instead, researchers engaged in simulating the dynamics of a protein complex are mainly interested in characterizing the conservation/variation of its biological interface. Results On these bases, herein we propose a novel approach to the analysis of MD trajectories or other conformational ensembles of protein complexes, MDcons, which uses the conservation of inter-residue contacts at the interface as a measure of the similarity between different snapshots. A "consensus contact map" is also provided, where the conservation of the different contacts is drawn in a grey scale. Finally, the interface area of the complex is monitored during the simulations. To show its utility, we used this novel approach to study two protein-protein complexes with interfaces of comparable size and both dominated by hydrophilic interactions, but having binding affinities at the extremes of the experimental range. MDcons is demonstrated to be extremely useful to analyse the MD trajectories of the investigated complexes, adding important insight into the dynamic behavior of their biological interface. Conclusions MDcons specifically allows the user to highlight and characterize the dynamics of the interface in protein complexes and can thus be used as a complementary tool for the analysis of MD simulations of both experimental and predicted structures of protein complexes. PMID:25077693

  11. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling.

    PubMed

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates.

  12. Genomic selection across multiple breeding cycles in applied bread wheat breeding.

    PubMed

    Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann

    2016-06-01

    We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.

  13. The Protein Interactome of Streptococcus pneumoniae and Bacterial Meta-interactomes Improve Function Predictions.

    PubMed

    Wuchty, S; Rajagopala, S V; Blazie, S M; Parrish, J R; Khuri, S; Finley, R L; Uetz, P

    2017-01-01

    The functions of roughly a third of all proteins in Streptococcus pneumoniae , a significant human-pathogenic bacterium, are unknown. Using a yeast two-hybrid approach, we have determined more than 2,000 novel protein interactions in this organism. We augmented this network with meta-interactome data that we defined as the pool of all interactions between evolutionarily conserved proteins in other bacteria. We found that such interactions significantly improved our ability to predict a protein's function, allowing us to provide functional predictions for 299 S. pneumoniae proteins with previously unknown functions. IMPORTANCE Identification of protein interactions in bacterial species can help define the individual roles that proteins play in cellular pathways and pathogenesis. Very few protein interactions have been identified for the important human pathogen S. pneumoniae . We used an experimental approach to identify over 2,000 new protein interactions for S. pneumoniae , the most extensive interactome data for this bacterium to date. To predict protein function, we used our interactome data augmented with interactions from other closely related bacteria. The combination of the experimental data and meta-interactome data significantly improved the prediction results, allowing us to assign possible functions to a large number of poorly characterized proteins.

  14. Deep learning methods for protein torsion angle prediction.

    PubMed

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  15. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    PubMed

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  16. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. An object programming based environment for protein secondary structure prediction.

    PubMed

    Giacomini, M; Ruggiero, C; Sacile, R

    1996-01-01

    The most frequently used methods for protein secondary structure prediction are empirical statistical methods and rule based methods. A consensus system based on object-oriented programming is presented, which integrates the two approaches with the aim of improving the prediction quality. This system uses an object-oriented knowledge representation based on the concepts of conformation, residue and protein, where the conformation class is the basis, the residue class derives from it and the protein class derives from the residue class. The system has been tested with satisfactory results on several proteins of the Brookhaven Protein Data Bank. Its results have been compared with the results of the most widely used prediction methods, and they show a higher prediction capability and greater stability. Moreover, the system itself provides an index of the reliability of its current prediction. This system can also be regarded as a basis structure for programs of this kind.

  18. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-04

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Computational analysis of multimorbidity between asthma, eczema and rhinitis

    PubMed Central

    Aguilar, Daniel; Pinart, Mariona; Koppelman, Gerard H.; Saeys, Yvan; Nawijn, Martijn C.; Postma, Dirkje S.; Akdis, Mübeccel; Auffray, Charles; Ballereau, Stéphane; Benet, Marta; García-Aymerich, Judith; González, Juan Ramón; Guerra, Stefano; Keil, Thomas; Kogevinas, Manolis; Lambrecht, Bart; Lemonnier, Nathanael; Melen, Erik; Sunyer, Jordi; Valenta, Rudolf; Valverde, Sergi; Wickman, Magnus; Bousquet, Jean; Oliva, Baldo; Antó, Josep M.

    2017-01-01

    Background The mechanisms explaining the co-existence of asthma, eczema and rhinitis (allergic multimorbidity) are largely unknown. We investigated the mechanisms underlying multimorbidity between three main allergic diseases at a molecular level by identifying the proteins and cellular processes that are common to them. Methods An in silico study based on computational analysis of the topology of the protein interaction network was performed in order to characterize the molecular mechanisms of multimorbidity of asthma, eczema and rhinitis. As a first step, proteins associated to either disease were identified using data mining approaches, and their overlap was calculated. Secondly, a functional interaction network was built, allowing to identify cellular pathways involved in allergic multimorbidity. Finally, a network-based algorithm generated a ranked list of newly predicted multimorbidity-associated proteins. Results Asthma, eczema and rhinitis shared a larger number of associated proteins than expected by chance, and their associated proteins exhibited a significant degree of interconnectedness in the interaction network. There were 15 pathways involved in the multimorbidity of asthma, eczema and rhinitis, including IL4 signaling and GATA3-related pathways. A number of proteins potentially associated to these multimorbidity processes were also obtained. Conclusions These results strongly support the existence of an allergic multimorbidity cluster between asthma, eczema and rhinitis, and suggest that type 2 signaling pathways represent a relevant multimorbidity mechanism of allergic diseases. Furthermore, we identified new candidates contributing to multimorbidity that may assist in identifying new targets for multimorbid allergic diseases. PMID:28598986

  20. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    PubMed Central

    2011-01-01

    Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388

  1. Computational prediction of protein hot spot residues.

    PubMed

    Morrow, John Kenneth; Zhang, Shuxing

    2012-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.

  2. Applying Recovery Biomarkers to Calibrate Self-Report Measures of Energy and Protein in the Hispanic Community Health Study/Study of Latinos

    PubMed Central

    Mossavar-Rahmani, Yasmin; Shaw, Pamela A.; Wong, William W.; Sotres-Alvarez, Daniela; Gellman, Marc D.; Van Horn, Linda; Stoutenberg, Mark; Daviglus, Martha L.; Wylie-Rosett, Judith; Siega-Riz, Anna Maria; Ou, Fang-Shu; Prentice, Ross L.

    2015-01-01

    We investigated measurement error in the self-reported diets of US Hispanics/Latinos, who are prone to obesity and related comorbidities, by background (Central American, Cuban, Dominican, Mexican, Puerto Rican, and South American) in 2010–2012. In 477 participants aged 18–74 years, doubly labeled water and urinary nitrogen were used as objective recovery biomarkers of energy and protein intakes. Self-report was captured from two 24-hour dietary recalls. All measures were repeated in a subsample of 98 individuals. We examined the bias of dietary recalls and their associations with participant characteristics using generalized estimating equations. Energy intake was underestimated by 25.3% (men, 21.8%; women, 27.3%), and protein intake was underestimated by 18.5% (men, 14.7%; women, 20.7%). Protein density was overestimated by 10.7% (men, 11.3%; women, 10.1%). Higher body mass index and Hispanic/Latino background were associated with underestimation of energy (P < 0.05). For protein intake, higher body mass index, older age, nonsmoking, Spanish speaking, and Hispanic/Latino background were associated with underestimation (P < 0.05). Systematic underreporting of energy and protein intakes and overreporting of protein density were found to vary significantly by Hispanic/Latino background. We developed calibration equations that correct for subject-specific error in reporting that can be used to reduce bias in diet-disease association studies. PMID:25995289

  3. Prediction of beta-turns in proteins using the first-order Markov models.

    PubMed

    Lin, Thy-Hou; Wang, Ging-Ming; Wang, Yen-Tseng

    2002-01-01

    We present a method based on the first-order Markov models for predicting simple beta-turns and loops containing multiple turns in proteins. Sequences of 338 proteins in a database are divided using the published turn criteria into the following three regions, namely, the turn, the boundary, and the nonturn ones. A transition probability matrix is constructed for either the turn or the nonturn region using the weighted transition probabilities computed for dipeptides identified from each region. There are two such matrices constructed for the boundary region since the transition probabilities for dipeptides immediately preceding or following a turn are different. The window used for scanning a protein sequence from amino (N-) to carboxyl (C-) terminal is a hexapeptide since the transition probability computed for a turn tetrapeptide is capped at both the N- and C- termini with a boundary transition probability indexed respectively from the two boundary transition matrices. A sum of the averaged product of the transition probabilities of all the hexapeptides involving each residue is computed. This is then weighted with a probability computed from assuming that all the hexapeptides are from the nonturn region to give the final prediction quantity. Both simple beta-turns and loops containing multiple turns in a protein are then identified by the rising of the prediction quantity computed. The performance of the prediction scheme or the percentage (%) of correct prediction is evaluated through computation of Matthews correlation coefficients for each protein predicted. It is found that the prediction method is capable of giving prediction results with better correlation between the percent of correct prediction and the Matthews correlation coefficients for a group of test proteins as compared with those predicted using some secondary structural prediction methods. The prediction accuracy for about 40% of proteins in the database or 50% of proteins in the test set is better than 70%. Such a percentage for the test set is reduced to 30 if the structures of all the proteins in the set are treated as unknown.

  4. BiFCROS: A Low-Background Fluorescence Repressor Operator System for Labeling of Genomic Loci.

    PubMed

    Milbredt, Sarah; Waldminghaus, Torsten

    2017-06-07

    Fluorescence-based methods are widely used to analyze elementary cell processes such as DNA replication or chromosomal folding and segregation. Labeling DNA with a fluorescent protein allows the visualization of its temporal and spatial organization. One popular approach is FROS (fluorescence repressor operator system). This method specifically labels DNA in vivo through binding of a fusion of a fluorescent protein and a repressor protein to an operator array, which contains numerous copies of the repressor binding site integrated into the genomic site of interest. Bound fluorescent proteins are then visible as foci in microscopic analyses and can be distinguished from the background fluorescence caused by unbound fusion proteins. Even though this method is widely used, no attempt has been made so far to decrease the background fluorescence to facilitate analysis of the actual signal of interest. Here, we present a new method that greatly reduces the background signal of FROS. BiFCROS (Bimolecular Fluorescence Complementation and Repressor Operator System) is based on fusions of repressor proteins to halves of a split fluorescent protein. Binding to a hybrid FROS array results in fluorescence signals due to bimolecular fluorescence complementation. Only proteins bound to the hybrid FROS array fluoresce, greatly improving the signal to noise ratio compared to conventional FROS. We present the development of BiFCROS and discuss its potential to be used as a fast and single-cell readout for copy numbers of genetic loci. Copyright © 2017 Milbredt and Waldminghaus.

  5. BiFCROS: A Low-Background Fluorescence Repressor Operator System for Labeling of Genomic Loci

    PubMed Central

    Milbredt, Sarah; Waldminghaus, Torsten

    2017-01-01

    Fluorescence-based methods are widely used to analyze elementary cell processes such as DNA replication or chromosomal folding and segregation. Labeling DNA with a fluorescent protein allows the visualization of its temporal and spatial organization. One popular approach is FROS (fluorescence repressor operator system). This method specifically labels DNA in vivo through binding of a fusion of a fluorescent protein and a repressor protein to an operator array, which contains numerous copies of the repressor binding site integrated into the genomic site of interest. Bound fluorescent proteins are then visible as foci in microscopic analyses and can be distinguished from the background fluorescence caused by unbound fusion proteins. Even though this method is widely used, no attempt has been made so far to decrease the background fluorescence to facilitate analysis of the actual signal of interest. Here, we present a new method that greatly reduces the background signal of FROS. BiFCROS (Bimolecular Fluorescence Complementation and Repressor Operator System) is based on fusions of repressor proteins to halves of a split fluorescent protein. Binding to a hybrid FROS array results in fluorescence signals due to bimolecular fluorescence complementation. Only proteins bound to the hybrid FROS array fluoresce, greatly improving the signal to noise ratio compared to conventional FROS. We present the development of BiFCROS and discuss its potential to be used as a fast and single-cell readout for copy numbers of genetic loci. PMID:28450375

  6. Structure and non-structure of centrosomal proteins.

    PubMed

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  7. Exploring Mouse Protein Function via Multiple Approaches.

    PubMed

    Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.

  8. Exploring Mouse Protein Function via Multiple Approaches

    PubMed Central

    Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315

  9. Bacillus anthracis secretome time course under host-simulated conditions and identification of immunogenic proteins.

    PubMed

    Walz, Alexander; Mujer, Cesar V; Connolly, Joseph P; Alefantis, Tim; Chafin, Ryan; Dake, Clarissa; Whittington, Jessica; Kumar, Srikanta P; Khan, Akbar S; DelVecchio, Vito G

    2007-07-27

    The secretion time course of Bacillus anthracis strain RA3R (pXO1+/pXO2-) during early, mid, and late log phase were investigated under conditions that simulate those encountered in the host. All of the identified proteins were analyzed by different software algorithms to characterize their predicted mode of secretion and cellular localization. In addition, immunogenic proteins were identified using sera from humans with cutaneous anthrax. A total of 275 extracellular proteins were identified by a combination of LC MS/MS and MALDI-TOF MS. All of the identified proteins were analyzed by SignalP, SecretomeP, PSORT, LipoP, TMHMM, and PROSITE to characterize their predicted mode of secretion, cellular localization, and protein domains. Fifty-three proteins were predicted by SignalP to harbor the cleavable N-terminal signal peptides and were therefore secreted via the classical Sec pathway. Twenty-three proteins were predicted by SecretomeP for secretion by the alternative Sec pathway characterized by the lack of typical export signal. In contrast to SignalP and SecretomeP predictions, PSORT predicted 171 extracellular proteins, 7 cell wall-associated proteins, and 6 cytoplasmic proteins. Moreover, 51 proteins were predicted by LipoP to contain putative Sec signal peptides (38 have SpI sites), lipoprotein signal peptides (13 have SpII sites), and N-terminal membrane helices (9 have transmembrane helices). The TMHMM algorithm predicted 25 membrane-associated proteins with one to ten transmembrane helices. Immunogenic proteins were also identified using sera from patients who have recovered from anthrax. The charge variants (83 and 63 kDa) of protective antigen (PA) were the most immunodominant secreted antigens, followed by charge variants of enolase and transketolase. This is the first description of the time course of protein secretion for the pathogen Bacillus anthracis. Time course studies of protein secretion and accumulation may be relevant in elucidation of the progression of pathogenicity, identification of therapeutics and diagnostic markers, and vaccine development. This study also adds to the continuously growing list of identified Bacillus anthracis secretome proteins.

  10. Comparing side chain packing in soluble proteins, protein-protein interfaces, and transmembrane proteins.

    PubMed

    Gaines, J C; Acebes, S; Virrueta, A; Butler, M; Regan, L; O'Hern, C S

    2018-05-01

    We compare side chain prediction and packing of core and non-core regions of soluble proteins, protein-protein interfaces, and transmembrane proteins. We first identified or created comparable databases of high-resolution crystal structures of these 3 protein classes. We show that the solvent-inaccessible cores of the 3 classes of proteins are equally densely packed. As a result, the side chains of core residues at protein-protein interfaces and in the membrane-exposed regions of transmembrane proteins can be predicted by the hard-sphere plus stereochemical constraint model with the same high prediction accuracies (>90%) as core residues in soluble proteins. We also find that for all 3 classes of proteins, as one moves away from the solvent-inaccessible core, the packing fraction decreases as the solvent accessibility increases. However, the side chain predictability remains high (80% within 30°) up to a relative solvent accessibility, rSASA≲0.3, for all 3 protein classes. Our results show that ≈40% of the interface regions in protein complexes are "core", that is, densely packed with side chain conformations that can be accurately predicted using the hard-sphere model. We propose packing fraction as a metric that can be used to distinguish real protein-protein interactions from designed, non-binding, decoys. Our results also show that cores of membrane proteins are the same as cores of soluble proteins. Thus, the computational methods we are developing for the analysis of the effect of hydrophobic core mutations in soluble proteins will be equally applicable to analyses of mutations in membrane proteins. © 2018 Wiley Periodicals, Inc.

  11. Update of the ATTRACT force field for the prediction of protein-protein binding affinity.

    PubMed

    Chéron, Jean-Baptiste; Zacharias, Martin; Antonczak, Serge; Fiorucci, Sébastien

    2017-06-05

    Determining the protein-protein interactions is still a major challenge for molecular biology. Docking protocols has come of age in predicting the structure of macromolecular complexes. However, they still lack accuracy to estimate the binding affinities, the thermodynamic quantity that drives the formation of a complex. Here, an updated version of the protein-protein ATTRACT force field aiming at predicting experimental binding affinities is reported. It has been designed on a dataset of 218 protein-protein complexes. The correlation between the experimental and predicted affinities reaches 0.6, outperforming most of the available protocols. Focusing on a subset of rigid and flexible complexes, the performance raises to 0.76 and 0.69, respectively. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  12. Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information

    PubMed Central

    Kumar, Ravindra; Jain, Sohni; Kumari, Bandana; Kumar, Manish

    2014-01-01

    The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server. PMID:24897370

  13. Characterization of essential proteins based on network topology in proteins interaction networks

    NASA Astrophysics Data System (ADS)

    Bakar, Sakhinah Abu; Taheri, Javid; Zomaya, Albert Y.

    2014-06-01

    The identification of essential proteins is theoretically and practically important as (1) it is essential to understand the minimal surviving requirements for cellular lives, and (2) it provides fundamental for development of drug. As conducting experimental studies to identify essential proteins are both time and resource consuming, here we present a computational approach in predicting them based on network topology properties from protein-protein interaction networks of Saccharomyces cerevisiae. The proposed method, namely EP3NN (Essential Proteins Prediction using Probabilistic Neural Network) employed a machine learning algorithm called Probabilistic Neural Network as a classifier to identify essential proteins of the organism of interest; it uses degree centrality, closeness centrality, local assortativity and local clustering coefficient of each protein in the network for such predictions. Results show that EP3NN managed to successfully predict essential proteins with an accuracy of 95% for our studied organism. Results also show that most of the essential proteins are close to other proteins, have assortativity behavior and form clusters/sub-graph in the network.

  14. Integrated web visualizations for protein-protein interaction databases.

    PubMed

    Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

    2015-06-16

    Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

  15. Comparative proteomics of cerebrospinal fluid reveals a predictive model for differential diagnosis of pneumococcal, meningococcal, and enteroviral meningitis, and novel putative therapeutic targets

    PubMed Central

    2015-01-01

    Background Meningitis is the inflammation of the meninges in response to infection or chemical agents. While aseptic meningitis, most frequently caused by enteroviruses, is usually benign with a self-limiting course, bacterial meningitis remains associated with high morbidity and mortality rates, despite advances in antimicrobial therapy and intensive care. Fast and accurate differential diagnosis is crucial for assertive choice of the appropriate therapeutic approach for each form of meningitis. Methods We used 2D-PAGE and mass spectrometry to identify the cerebrospinal fluid proteome specifically related to the host response to pneumococcal, meningococcal, and enteroviral meningitis. The disease-specific proteome signatures were inspected by pathway analysis. Results Unique cerebrospinal fluid proteome signatures were found to the three aetiological forms of meningitis investigated, and a qualitative predictive model with four protein markers was developed for the differential diagnosis of these diseases. Nevertheless, pathway analysis of the disease-specific proteomes unveiled that Kallikrein-kinin system may play a crucial role in the pathophysiological mechanisms leading to brain damage in bacterial meningitis. Proteins taking part in this cellular process are proposed as putative targets to novel adjunctive therapies. Conclusions Comparative proteomics of cerebrospinal fluid disclosed candidate biomarkers, which were combined in a qualitative and sequential predictive model with potential to improve the differential diagnosis of pneumococcal, meningococcal and enteroviral meningitis. Moreover, we present the first evidence of the possible implication of Kallikrein-kinin system in the pathophysiology of bacterial meningitis. PMID:26040285

  16. Integrated cellular network of transcription regulations and protein-protein interactions

    PubMed Central

    2010-01-01

    Background With the accumulation of increasing omics data, a key goal of systems biology is to construct networks at different cellular levels to investigate cellular machinery of the cell. However, there is currently no satisfactory method to construct an integrated cellular network that combines the gene regulatory network and the signaling regulatory pathway. Results In this study, we integrated different kinds of omics data and developed a systematic method to construct the integrated cellular network based on coupling dynamic models and statistical assessments. The proposed method was applied to S. cerevisiae stress responses, elucidating the stress response mechanism of the yeast. From the resulting integrated cellular network under hyperosmotic stress, the highly connected hubs which are functionally relevant to the stress response were identified. Beyond hyperosmotic stress, the integrated network under heat shock and oxidative stress were also constructed and the crosstalks of these networks were analyzed, specifying the significance of some transcription factors to serve as the decision-making devices at the center of the bow-tie structure and the crucial role for rapid adaptation scheme to respond to stress. In addition, the predictive power of the proposed method was also demonstrated. Conclusions We successfully construct the integrated cellular network which is validated by literature evidences. The integration of transcription regulations and protein-protein interactions gives more insight into the actual biological network and is more predictive than those without integration. The method is shown to be powerful and flexible and can be used under different conditions and for different species. The coupling dynamic models of the whole integrated cellular network are very useful for theoretical analyses and for further experiments in the fields of network biology and synthetic biology. PMID:20211003

  17. Predicting Glucose Sensor Behavior in Blood Using Transport Modeling: Relative Impacts of Protein Biofouling and Cellular Metabolic Effects

    PubMed Central

    Novak, Matthew T.; Yuan, Fan; Reichert, William M.

    2013-01-01

    Background Tissue response to indwelling glucose sensors remains a confounding barrier to clinical application. While the effects of fully formed capsular tissue on sensor response have been studied, little has been done to understand how tissue interactions occurring before capsule formation hinder sensor performance. Upon insertion in subcutaneous tissue, the sensor is initially exposed to blood, blood borne constituents, and interstitial fluid. Using human whole blood as a simple ex vivo experimental system, the effects of protein accumulation at the sensor surface (biofouling effects) and cellular consumption of glucose in both the biofouling layer and in the bulk (metabolic effects) on sensor response were assessed. Methods Medtronic MiniMed SofSensor glucose sensors were incubated in whole blood, plasma-diluted whole blood, and cell-free platelet-poor plasma (PPP) to analyze the impact of different blood constituents on sensor function. Experimental conditions were then simulated using MATLAB to predict the relative impacts of biofouling and metabolic effects on the observed sensor responses. Results Protein biofouling in PPP in both the experiments and the simulations was found to have no interfering effect upon sensor response. Experimental results obtained with whole and dilute blood showed that the sensor response was markedly affected by blood borne glucose-consuming cells accumulated in the biofouling layer and in the surrounding bulk. Conclusions The physical barrier to glucose transport presented by protein biofouling does not hinder glucose movement to the sensor surface, and the consumption of glucose by inflammatory cells, and not erythrocytes, proximal to the sensor surface has a substantial effect on sensor response and may be the main culprit for anomalous sensor behavior immediately following implantation. PMID:24351181

  18. Quantification of pancreatic cancer proteome and phosphorylome: indicates molecular events likely contributing to cancer and activity of drug targets.

    PubMed

    Britton, David; Zen, Yoh; Quaglia, Alberto; Selzer, Stefan; Mitra, Vikram; Löβner, Christopher; Jung, Stephan; Böhm, Gitte; Schmid, Peter; Prefot, Petra; Hoehle, Claudia; Koncarevic, Sasa; Gee, Julia; Nicholson, Robert; Ward, Malcolm; Castellano, Leandro; Stebbing, Justin; Zucht, Hans Dieter; Sarker, Debashis; Heaton, Nigel; Pike, Ian

    2014-01-01

    LC-MS/MS phospho-proteomics is an essential technology to help unravel the complex molecular events that lead to and propagate cancer. We have developed a global phospho-proteomic workflow to determine activity of signaling pathways and drug targets in pancreatic cancer tissue for clinical application. Peptides resulting from tryptic digestion of proteins extracted from frozen tissue of pancreatic ductal adenocarcinoma and background pancreas (n = 12), were labelled with tandem mass tags (TMT 8-plex), separated by strong cation exchange chromatography, then were analysed by LC-MS/MS directly or first enriched for phosphopeptides using IMAC and TiO2, prior to analysis. In-house, commercial and freeware bioinformatic platforms were used to identify relevant biological events from the complex dataset. Of 2,101 proteins identified, 152 demonstrated significant difference in abundance between tumor and non-tumor tissue. They included proteins that are known to be up-regulated in pancreatic cancer (e.g. Mucin-1), but the majority were new candidate markers such as HIPK1 & MLCK. Of the 6,543 unique phosphopeptides identified (6,284 unique phosphorylation sites), 635 showed significant regulation, particularly those from proteins involved in cell migration (Rho guanine nucleotide exchange factors & MRCKα) and formation of focal adhesions. Activator phosphorylation sites on FYN, AKT1, ERK2, HDAC1 and other drug targets were found to be highly modulated (≥2 fold) in different cases highlighting their predictive power. Here we provided critical information enabling us to identify the common and unique molecular events likely contributing to cancer in each case. Such information may be used to help predict more bespoke therapy suitable for an individual case.

  19. Transcriptome analysis of the honey bee fungal pathogen, Ascosphaera apis: implications for host pathogenesis

    PubMed Central

    2012-01-01

    Background We present a comprehensive transcriptome analysis of the fungus Ascosphaera apis, an economically important pathogen of the Western honey bee (Apis mellifera) that causes chalkbrood disease. Our goals were to further annotate the A. apis reference genome and to identify genes that are candidates for being differentially expressed during host infection versus axenic culture. Results We compared A. apis transcriptome sequence from mycelia grown on liquid or solid media with that dissected from host-infected tissue. 454 pyrosequencing provided 252 Mb of filtered sequence reads from both culture types that were assembled into 10,087 contigs. Transcript contigs, protein sequences from multiple fungal species, and ab initio gene predictions were included as evidence sources in the Maker gene prediction pipeline, resulting in 6,992 consensus gene models. A phylogeny based on 12 of these protein-coding loci further supported the taxonomic placement of Ascosphaera as sister to the core Onygenales. Several common protein domains were less abundant in A. apis compared with related ascomycete genomes, particularly cytochrome p450 and protein kinase domains. A novel gene family was identified that has expanded in some ascomycete lineages, but not others. We manually annotated genes with homologs in other fungal genomes that have known relevance to fungal virulence and life history. Functional categories of interest included genes involved in mating-type specification, intracellular signal transduction, and stress response. Computational and manual annotations have been made publicly available on the Bee Pests and Pathogens website. Conclusions This comprehensive transcriptome analysis substantially enhances our understanding of the A. apis genome and its expression during infection of honey bee larvae. It also provides resources for future molecular studies of chalkbrood disease and ultimately improved disease management. PMID:22747707

  20. TIMPs of parasitic helminths – a large-scale analysis of high-throughput sequence datasets

    PubMed Central

    2013-01-01

    Background Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. Methods In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. Results A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Conclusions Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases. PMID:23721526

  1. Challenging the state-of-the-art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-01-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984

  2. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  3. Template-based modeling and ab initio refinement of protein oligomer structures using GALAXY in CAPRI round 30.

    PubMed

    Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok

    2017-03-01

    Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Semi-supervised protein subcellular localization.

    PubMed

    Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang

    2009-01-30

    Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.

  5. Topology of membrane proteins-predictions, limitations and variations.

    PubMed

    Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

    2017-10-26

    Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. MCL-CAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure

    PubMed Central

    2010-01-01

    Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw. PMID:20939868

  7. Protein side chain conformation predictions with an MMGBSA energy function.

    PubMed

    Gaillard, Thomas; Panel, Nicolas; Simonson, Thomas

    2016-06-01

    The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  8. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  9. Great interactions: How binding incorrect partners can teach us about protein recognition and function.

    PubMed

    Vamparys, Lydie; Laurent, Benoist; Carbone, Alessandra; Sacquin-Mora, Sophie

    2016-10-01

    Protein-protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross-docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross-docking predictions using the area under the specificity-sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding-site predictions resulting from the cross-docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408-1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  10. Computational Prediction of Hot Spot Residues

    PubMed Central

    Morrow, John Kenneth; Zhang, Shuxing

    2013-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues. PMID:22316154

  11. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods

    PubMed Central

    Gu, Wenxiang; Zhang, Wenyi; Wang, Jianan

    2015-01-01

    Glycation is a nonenzymatic process in which proteins react with reducing sugar molecules. The identification of glycation sites in protein may provide guidelines to understand the biological function of protein glycation. In this study, we developed a computational method to predict protein glycation sites by using the support vector machine classifier. The experimental results showed that the prediction accuracy was 85.51% and an overall MCC was 0.70. Feature analysis indicated that the composition of k-spaced amino acid pairs feature contributed the most for glycation sites prediction. PMID:25961025

  12. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  13. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

    PubMed Central

    Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi

    2018-01-01

    Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746

  14. G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

    PubMed

    Lee, Hui Sun; Im, Wonpil

    2017-01-01

    Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.

  15. Functional annotation from the genome sequence of the giant panda.

    PubMed

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  16. Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

    PubMed

    Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

    2014-01-01

    Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

  17. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  18. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.

    PubMed

    Zhou, Hufeng; Gao, Shangzhi; Nguyen, Nam Ninh; Fan, Mengyuan; Jin, Jingjing; Liu, Bing; Zhao, Liang; Xiong, Geng; Tan, Min; Li, Shijun; Wong, Limsoon

    2014-04-08

    H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both host and pathogen proteins involved in host-pathogen PPIs tend to have lower charge, and tend to be more hydrophilic. Our stringent homology-based prediction approach provides a better strategy in predicting PPIs between eukaryotic hosts and prokaryotic pathogens than a conventional homology-based approach. The properties we have observed from the predicted H. sapiens-M. tuberculosis H37Rv PPI network are useful for understanding inter-species host-pathogen PPI networks and provide novel insights for host-pathogen interaction studies.

  19. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  20.  PARK2 polymorphisms predict disease progression in patients infected with hepatitis C virus.

    PubMed

    Al-Qahtani, Ahmed A; Al-Anazi, Mashael R; Al-Zoghaibi, Fahad A; Abdo, Ayman A; Sanai, Faisal M; Al-Hamoudi, Waleed K; Alswat, Khalid A; Al-Ashgar, Hamad I; Khan, Mohammed Q; Albenmousa, Ali; Khalak, Hanif; Al-Ahdal, Mohammed N

     Background. The protein encoded by PARK2 gene is a component of the ubiquitin-proteasome system that mediates targeting of proteins for the degradation pathway. Genetic variations at PARK2 gene were linked to various diseases including leprosy, typhoid and cancer. The present study investigated the association of single nucleotide polymorphisms (SNPs) in the PARK2 gene with the development of hepatitis C virus (HCV) infection and its progression to severe liver diseases. A total of 800 subjects, including 400 normal healthy subjects and 400 HCV-infected patients, were analyzed in this study. The patients were classified as chronic HCV patients (group I), patients with cirrhosis (group II) and patients with hepatocellular carcinoma (HCC) in the context of cirrhosis (group III). DNA was extracted and was genotyped for the SNPs rs10945859, rs2803085, rs2276201 and rs1931223. Among these SNPs, CT genotype of rs10945859 was found to have a significant association towards the clinical progression of chronic HCV infection to cirrhosis alone (OR = 1.850; 95% C. I. 1.115-3.069; p = 0.016) or cirrhosis and HCC (OR = 1.768; 95% C. I. 1.090-2.867; p value = 0.020). SNP rs10945859 in the PARK2 gene could prove useful in predicting the clinical outcome in HCV-infected patients.

  1. Nanometer Scale Titanium Surface Texturing Are Detected by Signaling Pathways Involving Transient FAK and Src Activations

    PubMed Central

    Zambuzzi, Willian F.; Bonfante, Estevam A.; Jimbo, Ryo; Hayashi, Mariko; Andersson, Martin; Alves, Gutemberg; Takamori, Esther R.; Beltrão, Paulo J.; Coelho, Paulo G.; Granjeiro, José M.

    2014-01-01

    Background It is known that physico/chemical alterations on biomaterial surfaces have the capability to modulate cellular behavior, affecting early tissue repair. Such surface modifications are aimed to improve early healing response and, clinically, offer the possibility to shorten the time from implant placement to functional loading. Since FAK and Src are intracellular proteins able to predict the quality of osteoblast adhesion, this study evaluated the osteoblast behavior in response to nanometer scale titanium surface texturing by monitoring FAK and Src phosphorylations. Methodology Four engineered titanium surfaces were used for the study: machined (M), dual acid-etched (DAA), resorbable media microblasted and acid-etched (MBAA), and acid-etch microblasted (AAMB). Surfaces were characterized by scanning electron microscopy, interferometry, atomic force microscopy, x-ray photoelectron spectroscopy and energy dispersive X-ray spectroscopy. Thereafter, those 4 samples were used to evaluate their cytotoxicity and interference on FAK and Src phosphorylations. Both Src and FAK were investigated by using specific antibody against specific phosphorylation sites. Principal Findings The results showed that both FAK and Src activations were differently modulated as a function of titanium surfaces physico/chemical configuration and protein adsorption. Conclusions It can be suggested that signaling pathways involving both FAK and Src could provide biomarkers to predict osteoblast adhesion onto different surfaces. PMID:24999733

  2. Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology

    PubMed Central

    Gu, Jiangyong; Gui, Yuanshen; Chen, Lirong; Yuan, Gu; Lu, Hui-Zhe; Xu, Xiaojie

    2013-01-01

    Background Natural products have been an important source of lead compounds for drug discovery. How to find and evaluate bioactive natural products is critical to the achievement of drug/lead discovery from natural products. Methodology We collected 19,7201 natural products structures, reported biological activities and virtual screening results. Principal component analysis was employed to explore the chemical space, and we found that there was a large portion of overlap between natural products and FDA-approved drugs in the chemical space, which indicated that natural products had large quantity of potential lead compounds. We also explored the network properties of natural product-target networks and found that polypharmacology was greatly enriched to those compounds with large degree and high betweenness centrality. In order to make up for a lack of experimental data, high throughput virtual screening was employed. All natural products were docked to 332 target proteins of FDA-approved drugs. The most potential natural products for drug discovery and their indications were predicted based on a docking score-weighted prediction model. Conclusions Analysis of molecular descriptors, distribution in chemical space and biological activities of natural products was conducted in this article. Natural products have vast chemical diversity, good drug-like properties and can interact with multiple cellular target proteins. PMID:23638153

  3. Spatial gradients of protein-level time delays set the pace of the traveling segmentation clock waves

    PubMed Central

    Ay, Ahmet; Holland, Jack; Sperlea, Adriana; Devakanmalai, Gnanapackiam Sheela; Knierer, Stephan; Sangervasi, Sebastian; Stevenson, Angel; Özbudak, Ertuğrul M.

    2014-01-01

    The vertebrate segmentation clock is a gene expression oscillator controlling rhythmic segmentation of the vertebral column during embryonic development. The period of oscillations becomes longer as cells are displaced along the posterior to anterior axis, which results in traveling waves of clock gene expression sweeping in the unsegmented tissue. Although various hypotheses necessitating the inclusion of additional regulatory genes into the core clock network at different spatial locations have been proposed, the mechanism underlying traveling waves has remained elusive. Here, we combined molecular-level computational modeling and quantitative experimentation to solve this puzzle. Our model predicts the existence of an increasing gradient of gene expression time delays along the posterior to anterior direction to recapitulate spatiotemporal profiles of the traveling segmentation clock waves in different genetic backgrounds in zebrafish. We validated this prediction by measuring an increased time delay of oscillatory Her1 protein production along the unsegmented tissue. Our results refuted the need for spatial expansion of the core feedback loop to explain the occurrence of traveling waves. Spatial regulation of gene expression time delays is a novel way of creating dynamic patterns; this is the first report demonstrating such a control mechanism in any tissue and future investigations will explore the presence of analogous examples in other biological systems. PMID:25336742

  4. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

    PubMed

    Wang, Xiao; Zhang, Jun; Li, Guo-Zheng

    2015-01-01

    It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.

  5. Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

    NASA Astrophysics Data System (ADS)

    Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

    2017-04-01

    Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

  6. Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning.

    PubMed

    Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

    2017-04-18

    Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

  7. Prediction of Protein Aggregation in High Concentration Protein Solutions Utilizing Protein-Protein Interactions Determined by Low Volume Static Light Scattering.

    PubMed

    Hofmann, Melanie; Winzer, Matthias; Weber, Christian; Gieseler, Henning

    2016-06-01

    The development of highly concentrated protein formulations is more demanding than for conventional concentrations due to an elevated protein aggregation tendency. Predictive protein-protein interaction parameters, such as the second virial coefficient B22 or the interaction parameter kD, have already been used to predict aggregation tendency and optimize protein formulations. However, these parameters can only be determined in diluted solutions, up to 20 mg/mL. And their validity at high concentrations is currently controversially discussed. This work presents a μ-scale screening approach which has been adapted to early industrial project needs. The procedure is based on static light scattering to directly determine protein-protein interactions at concentrations up to 100 mg/mL. Three different therapeutic molecules were formulated, varying in pH, salt content, and addition of excipients (e.g., sugars, amino acids, polysorbates, or other macromolecules). Validity of the predicted aggregation tendency was confirmed by stability data of selected formulations. Based on the results obtained, the new prediction method is a promising screening tool for fast and easy formulation development of highly concentrated protein solutions, consuming only microliter of sample volumes. Copyright © 2016 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  8. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  9. Predicting the Dynamics of Protein Abundance

    PubMed Central

    Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael

    2014-01-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/. PMID:24532840

  10. Predicting the dynamics of protein abundance.

    PubMed

    Mehdi, Ahmed M; Patrick, Ralph; Bailey, Timothy L; Bodén, Mikael

    2014-05-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/.

  11. AptRank: an adaptive PageRank model for protein function prediction on   bi-relational graphs.

    PubMed

    Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael

    2017-06-15

    Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

    PubMed

    Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

    2012-12-01

    Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.

  13. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

    PubMed

    López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

  14. Performance of MDockPP in CAPRI rounds 28-29 and 31-35 including the prediction of water-mediated interactions.

    PubMed

    Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin

    2017-03-01

    Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Using Recombinant Proteins from Lutzomyia longipalpis Saliva to Estimate Human Vector Exposure in Visceral Leishmaniasis Endemic Areas

    PubMed Central

    Souza, Ana Paula; Andrade, Bruno Bezerril; Aquino, Dorlene; Entringer, Petter; Miranda, José Carlos; Alcantara, Ruan; Ruiz, Daniel; Soto, Manuel; Teixeira, Clarissa R.; Valenzuela, Jesus G.; de Oliveira, Camila Indiani; Brodskyn, Cláudia Ida; Barral-Netto, Manoel; Barral, Aldina

    2010-01-01

    Background Leishmania is transmitted by female sand flies and deposited together with saliva, which contains a vast repertoire of pharmacologically active molecules that contribute to the establishment of the infection. The exposure to vector saliva induces an immune response against its components that can be used as a marker of exposure to the vector. Performing large-scale serological studies to detect vector exposure has been limited by the difficulty in obtaining sand fly saliva. Here, we validate the use of two sand fly salivary recombinant proteins as markers for vector exposure. Methodology/principal findings ELISA was used to screen human sera, collected in an area endemic for visceral leishmaniasis, against the salivary gland sonicate (SGS) or two recombinant proteins (rLJM11 and rLJM17) from Lutzomyia longipalpis saliva. Antibody levels before and after SGS seroconversion (n = 26) were compared using the Wilcoxon signed rank paired test. Human sera from an area endemic for VL which recognize Lu. longipalpis saliva in ELISA also recognize a combination of rLJM17 and rLJM11. We then extended the analysis to include 40 sera from individuals who were seropositive and 40 seronegative to Lu. longipalpis SGS. Each recombinant protein was able to detect anti-saliva seroconversion, whereas the two proteins combined increased the detection significantly. Additionally, we evaluated the specificity of the anti-Lu. longipalpis response by testing 40 sera positive to Lutzomyia intermedia SGS, and very limited (2/40) cross-reactivity was observed. Receiver-operator characteristics (ROC) curve analysis was used to identify the effectiveness of these proteins for the prediction of anti-SGS positivity. These ROC curves evidenced the superior performance of rLJM17+rLJM11. Predicted threshold levels were confirmed for rLJM17+rLJM11 using a large panel of 1,077 serum samples. Conclusion Our results show the possibility of substituting Lu. longipalpis SGS for two recombinant proteins, LJM17 and LJM11, in order to probe for vector exposure in individuals residing in endemic areas. PMID:20351785

  16. Proteins with Novel Structure, Function and Dynamics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  17. Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

    PubMed Central

    2014-01-01

    Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328

  18. Prediction of change in protein unfolding rates upon point mutations in two state proteins.

    PubMed

    Chaudhary, Priyashree; Naganathan, Athi N; Gromiha, M Michael

    2016-09-01

    Studies on protein unfolding rates are limited and challenging due to the complexity of unfolding mechanism and the larger dynamic range of the experimental data. Though attempts have been made to predict unfolding rates using protein sequence-structure information there is no available method for predicting the unfolding rates of proteins upon specific point mutations. In this work, we have systematically analyzed a set of 790 single mutants and developed a robust method for predicting protein unfolding rates upon mutations (Δlnku) in two-state proteins by combining amino acid properties and knowledge-based classification of mutants with multiple linear regression technique. We obtain a mean absolute error (MAE) of 0.79/s and a Pearson correlation coefficient (PCC) of 0.71 between predicted unfolding rates and experimental observations using jack-knife test. We have developed a web server for predicting protein unfolding rates upon mutation and it is freely available at https://www.iitm.ac.in/bioinfo/proteinunfolding/unfoldingrace.html. Prominent features that determine unfolding kinetics as well as plausible reasons for the observed outliers are also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern.

    PubMed

    Zhang, Tong-Liang; Ding, Yong-Sheng; Chou, Kuo-Chen

    2008-01-07

    Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.

  20. Prediction of essential proteins based on gene expression programming.

    PubMed

    Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi

    2013-01-01

    Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.

  1. Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance.

    PubMed

    Qiu, Zhijun; Zhou, Bo; Yuan, Jiangfeng

    2017-11-21

    Protein-protein interaction site (PPIS) prediction must deal with the diversity of interaction sites that limits their prediction accuracy. Use of proteins with unknown or unidentified interactions can also lead to missing interfaces. Such data errors are often brought into the training dataset. In response to these two problems, we used the minimum covariance determinant (MCD) method to refine the training data to build a predictor with better performance, utilizing its ability of removing outliers. In order to predict test data in practice, a method based on Mahalanobis distance was devised to select proper test data as input for the predictor. With leave-one-validation and independent test, after the Mahalanobis distance screening, our method achieved higher performance according to Matthews correlation coefficient (MCC), although only a part of test data could be predicted. These results indicate that data refinement is an efficient approach to improve protein-protein interaction site prediction. By further optimizing our method, it is hopeful to develop predictors of better performance and wide range of application. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Blind test of physics-based prediction of protein structures.

    PubMed

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  3. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  4. Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

    PubMed

    Nourani, Esmaeil; Khunjush, Farshad; Durmuş, Saliha

    2016-05-24

    Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods.

  5. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  6. Clinical global assessment of nutritional status as predictor of mortality in chronic kidney disease patients

    PubMed Central

    Dai, Lu; Mukai, Hideyuki; Lindholm, Bengt; Heimbürger, Olof; Barany, Peter; Stenvinkel, Peter

    2017-01-01

    Background The value of subjective global assessment (SGA) as nutritional assessor of protein-energy wasting (PEWSGA) in chronic kidney disease (CKD) patients depends on its mortality predictive capacity. We investigated associations of PEWSGA with markers of nutritional status and all-cause mortality in CKD patients. Methods In 1031 (732 CKD1-5 non-dialysis and 299 dialysis) patients, SGA and body (BMI), lean (LBMI) and fat (FBMI) body mass indices, % handgrip strength (% HGS), serum albumin, and high sensitivity C-reactive protein (hsCRP) were examined at baseline. The five-year all-cause mortality predictive strength of baseline PEWSGA and during follow-up were investigated. Results PEWSGA was present in 2% of CKD1-2, 16% of CKD3-4, 31% of CKD5 non-dialysis and 44% of dialysis patients. Patients with PEWSGA (n = 320; 31%) had higher hsCRP and lower BMI, LBMI, FBMI, %HGS and serum albumin. But, using receiver operating characteristics-derived cutoffs, these markers could not classify (by kappa statistic) or explain variations of (by multinomial logistic regression analysis) presence of PEWSGA. In generalized linear models, SGA independently predicted mortality after adjustments of multiple confounders (RR: 1.17; 95% CI: 1.11–1.23). Among 323 CKD5 patients who were re-assessed after median 12.6 months, 222 (69%) remained well-nourished, 37 (11%) developed PEWSGA de novo, 40 (12%) improved while 24 (8%) remained with PEWSGA. The latter independently predicted mortality (RR: 1.29; 95% CI: 1.13–1.46). Conclusions SGA, a valid assessor of nutritional status, is an independent predictor of all-cause mortality both in CKD non-dialysis and dialysis patients that outperforms non-composite nutritional markers as prognosticator. PMID:29211778

  7. Prognostic and Pathogenetic Value of Combining Clinical and Biochemical Indices in Patients With Acute Lung Injury

    PubMed Central

    Koyama, Tatsuki; Billheimer, D. Dean; Wu, William; Bernard, Gordon R.; Thompson, B. Taylor; Brower, Roy G.; Standiford, Theodore J.; Martin, Thomas R.; Matthay, Michael A.

    2010-01-01

    Background: No single clinical or biologic marker reliably predicts clinical outcomes in acute lung injury (ALI)/ARDS. We hypothesized that a combination of biologic and clinical markers would be superior to either biomarkers or clinical factors alone in predicting ALI/ARDS mortality and would provide insight into the pathogenesis of clinical ALI/ARDS. Methods: Eight biologic markers that reflect endothelial and epithelial injury, inflammation, and coagulation (von Willebrand factor antigen, surfactant protein D [SP-D]), tumor necrosis factor receptor-1, interleukin [IL]-6, IL-8, intercellular adhesion molecule-1, protein C, plasminogen activator inhibitor-1) were measured in baseline plasma from 549 patients in the ARDSNet trial of low vs high positive end-expiratory pressure. Mortality was modeled with multivariable logistic regression. Predictors were selected using backward elimination. Comparisons between candidate models were based on the receiver operating characteristics (ROC) and tests of integrated discrimination improvement. Results: Clinical predictors (Acute Physiology And Chronic Health Evaluation III [APACHE III], organ failures, age, underlying cause, alveolar-arterial oxygen gradient, plateau pressure) predicted mortality with an area under the ROC curve (AUC) of 0.82; a combination of eight biomarkers and the clinical predictors had an AUC of 0.85. The best performing biomarkers were the neutrophil chemotactic factor, IL-8, and SP-D, a product of alveolar type 2 cells, supporting the concept that acute inflammation and alveolar epithelial injury are important pathogenetic pathways in human ALI/ARDS. Conclusions: A combination of biomarkers and clinical predictors is superior to clinical predictors or biomarkers alone for predicting mortality in ALI/ARDS and may be useful for stratifying patients in clinical trials. From a pathogenesis perspective, the degree of acute inflammation and alveolar epithelial injury are highly associated with the outcome of human ALI/ARDS. PMID:19858233

  8. A Kernel for Open Source Drug Discovery in Tropical Diseases

    PubMed Central

    Ortí, Leticia; Carbajo, Rodrigo J.; Pieper, Ursula; Eswar, Narayanan; Maurer, Stephen M.; Rai, Arti K.; Taylor, Ginger; Todd, Matthew H.; Pineda-Lucena, Antonio; Sali, Andrej; Marti-Renom, Marc A.

    2009-01-01

    Background Conventional patent-based drug development incentives work badly for the developing world, where commercial markets are usually small to non-existent. For this reason, the past decade has seen extensive experimentation with alternative R&D institutions ranging from private–public partnerships to development prizes. Despite extensive discussion, however, one of the most promising avenues—open source drug discovery—has remained elusive. We argue that the stumbling block has been the absence of a critical mass of preexisting work that volunteers can improve through a series of granular contributions. Historically, open source software collaborations have almost never succeeded without such “kernels”. Methodology/Principal Findings Here, we use a computational pipeline for: (i) comparative structure modeling of target proteins, (ii) predicting the localization of ligand binding sites on their surfaces, and (iii) assessing the similarity of the predicted ligands to known drugs. Our kernel currently contains 143 and 297 protein targets from ten pathogen genomes that are predicted to bind a known drug or a molecule similar to a known drug, respectively. The kernel provides a source of potential drug targets and drug candidates around which an online open source community can nucleate. Using NMR spectroscopy, we have experimentally tested our predictions for two of these targets, confirming one and invalidating the other. Conclusions/Significance The TDI kernel, which is being offered under the Creative Commons attribution share-alike license for free and unrestricted use, can be accessed on the World Wide Web at http://www.tropicaldisease.org. We hope that the kernel will facilitate collaborative efforts towards the discovery of new drugs against parasites that cause tropical diseases. PMID:19381286

  9. Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework

    PubMed Central

    2014-01-01

    Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set. PMID:24646119

  10. Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.

    PubMed

    Simha, Ramanuja; Shatkay, Hagit

    2014-03-19

    Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.

  11. Protein (multi-)location prediction: utilizing interdependencies via a generative model.

    PubMed

    Simha, Ramanuja; Briesemeister, Sebastian; Kohlbacher, Oliver; Shatkay, Hagit

    2015-06-15

    Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. We introduce a probabilistic generative model for protein localization, and develop a system based on it-which we call MDLoc-that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems. MDLoc is available at: http://www.eecis.udel.edu/∼compbio/mdloc. © The Author 2015. Published by Oxford University Press.

  12. Hidden markov model for the prediction of transmembrane proteins using MATLAB.

    PubMed

    Chaturvedi, Navaneet; Shanker, Sudhanshu; Singh, Vinay Kumar; Sinha, Dhiraj; Pandey, Paras Nath

    2011-01-01

    Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy. The database is available for free at bioinfonavneet@gmail.comvinaysingh@bhu.ac.in.

  13. DeepLoc: prediction of protein subcellular localization using deep learning.

    PubMed

    Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole

    2017-11-01

    The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-02-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.

  15. Knowledge-based prediction of protein backbone conformation using a structural alphabet.

    PubMed

    Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard

    2017-01-01

    Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

  16. Computational Framework for Prediction of Peptide Sequences That May Mediate Multiple Protein Interactions in Cancer-Associated Hub Proteins.

    PubMed

    Sarkar, Debasree; Patra, Piya; Ghosh, Abhirupa; Saha, Sudipto

    2016-01-01

    A considerable proportion of protein-protein interactions (PPIs) in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs), often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME) were used to identify such peptides in three cancer-associated hub proteins-MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs). These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI modulators.

  17. Estimating carcass fat and protein in northern pintails during the nonbreeding season

    USGS Publications Warehouse

    Miller, M.R.

    1989-01-01

    I used northern pintails (Anas acuta) collected from August through March 1979-82 in the Sacramento Valley, California to derive equations to predict ether-extracted carcass fat, carcass protein, and skeletal lean dry weight. Ether-extracted carcass fat was best predicted by total fat depot weight (wet skin, abdominal fat, and intestinal fat) (r2 = 0.94) and estimates based on carcass water content (r2 = 0.93-0.98). Measured carcass protein was best predicted by a multiple regression including total protein depot weight (breast muscles, leg muscles, and gizzard) and tarsus length (R2 = 0.79). I predicted skeletal lean dry weight by a multiple regression incorporating culmen, tarsus, and wing length (R2 = 0.77). Predicted carcass fat agreed well with measured carcass fat in an independent data set of 30 pintails using total fat depot (r2 = 0.92-0.96) and carcass water (r2 = 0.97-0.99), but predicted carcass protein agreed less well with measured protein.

  18. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    PubMed

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. Conservation of hot regions in protein-protein interaction in evolution.

    PubMed

    Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong

    2016-11-01

    The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.

  20. Protein asparagine deamidation prediction based on structures with machine learning methods.

    PubMed

    Jia, Lei; Sun, Yaxiong

    2017-01-01

    Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

  1. Protein Structure Prediction by Protein Threading

    NASA Astrophysics Data System (ADS)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  2. Characterization of rubella-specific humoral immunity following two doses of MMR vaccine using proteome microarray technology

    PubMed Central

    Haralambieva, Iana H.; Gibson, Michael J.; Kennedy, Richard B.; Ovsyannikova, Inna G.; Warner, Nathaniel D.; Grill, Diane E.

    2017-01-01

    Introduction//Background The lack of standardization of the currently used commercial anti-rubella IgG antibody assays leads to frequent misinterpretation of results for samples with low/equivocal antibody concentration. The use of alternative approaches in rubella serology could add new information leading to a fuller understanding of rubella protective immunity and neutralizing antibody response after vaccination. Methods We applied microarray technology to measure antibodies to all rubella virus proteins in 75 high and 75 low rubella virus-specific antibody responders after two MMR vaccine doses. These data were used in multivariate penalized logistic regression modeling of rubella-specific neutralizing antibody response after vaccination. Results We measured antibodies to all rubella virus structural proteins (i.e., the glycoproteins E1 and E2 and the capsid C protein) and to the non-structural protein P150. Antibody levels to each of these proteins were: correlated with the neutralizing antibody titer (p<0.006); demonstrated differences between the high and the low antibody responder groups (p<0.008); and were components of the model associated with/predictive of vaccine-induced rubella virus-specific neutralizing antibody titers (misclassification error = 0.2). Conclusion Our study supports the use of this new technology, as well as the use of antibody profiles/patterns (rather than single antibody measures) as biomarkers of neutralizing antibody response and correlates of protective immunity in rubella virus serology. PMID:29145521

  3. Evolution and functional divergence of the anoctamin family of membrane proteins

    PubMed Central

    2010-01-01

    Background The anoctamin family of transmembrane proteins are found in all eukaryotes and consists of 10 members in vertebrates. Ano1 and ano2 were observed to have Ca2+ activated Cl- channel activity. Recent findings however have revealed that ano6, and ano7 can also produce chloride currents, although with different properties. In contrast, ano9 and ano10 suppress baseline Cl- conductance when co-expressed with ano1 thus suggesting that different anoctamins can interfere with each other. In order to elucidate intrinsic functional diversity, and underlying evolutionary mechanism among anoctamins, we performed comprehensive bioinformatics analysis of anoctamin gene family. Results Our results show that anoctamin protein paralogs evolved from several gene duplication events followed by functional divergence of vertebrate anoctamins. Most of the amino acid replacements responsible for the functional divergence were fixed by adaptive evolution and this seem to be a common pattern in anoctamin gene family evolution. Strong purifying selection and the loss of many gene duplication products indicate rigid structure-function relationships among anoctamins. Conclusions Our study suggests that anoctamins have evolved by series of duplication events, and that they are constrained by purifying selection. In addition we identified a number of protein domains, and amino acid residues which contribute to predicted functional divergence. Hopefully, this work will facilitate future functional characterization of the anoctamin membrane protein family. PMID:20964844

  4. G Protein–Coupled Receptor-Type G Proteins Are Required for Light-Dependent Seedling Growth and Fertility in Arabidopsis[W

    PubMed Central

    Jaffé, Felix W.; Freschet, Gian-Enrico C.; Valdes, Billy M.; Runions, John; Terry, Matthew J.; Williams, Lorraine E.

    2012-01-01

    G protein–coupled receptor-type G proteins (GTGs) are highly conserved membrane proteins in plants, animals, and fungi that have eight to nine predicted transmembrane domains. They have been classified as G protein–coupled receptor-type G proteins that function as abscisic acid (ABA) receptors in Arabidopsis thaliana. We cloned Arabidopsis GTG1 and GTG2 and isolated new T-DNA insertion alleles of GTG1 and GTG2 in both Wassilewskija and Columbia backgrounds. These gtg1 gtg2 double mutants show defects in fertility, hypocotyl and root growth, and responses to light and sugars. Histological studies of shoot tissue reveal cellular distortions that are particularly evident in the epidermal layer. Stable expression of GTG1pro:GTG1-GFP (for green fluorescent protein) in Arabidopsis and transient expression in tobacco (Nicotiana tabacum) indicate that GTG1 is localized primarily to Golgi bodies and to the endoplasmic reticulum. Microarray analysis comparing gene expression profiles in the wild type and double mutant revealed differences in expression of genes important for cell wall function, hormone response, and amino acid metabolism. The double mutants isolated here respond normally to ABA in seed germination assays, root growth inhibition, and gene expression analysis. These results are inconsistent with their proposed role as ABA receptors but demonstrate that GTGs are fundamentally important for plant growth and development. PMID:23001037

  5. Characterization of Aryl Hydrocarbon Receptor Interacting Protein (AIP) Mutations in Familial Isolated Pituitary Adenoma Families

    PubMed Central

    Igreja, Susana; Chahal, Harvinder S; King, Peter; Bolger, Graeme B; Srirangalingam, Umasuthan; Guasti, Leonardo; Chapple, J Paul; Trivellin, Giampaolo; Gueorguiev, Maria; Guegan, Katie; Stals, Karen; Khoo, Bernard; Kumar, Ajith V; Ellard, Sian; Grossman, Ashley B; Korbonits, Márta

    2010-01-01

    Familial isolated pituitary adenoma (FIPA) is an autosomal dominant condition with variable genetic background and incomplete penetrance. Germline mutations of the aryl hydrocarbon receptor interacting protein (AIP) gene have been reported in 15–40% of FIPA patients. Limited data are available on the functional consequences of the mutations or regarding the regulation of the AIP gene. We describe a large cohort of FIPA families and characterize missense and silent mutations using minigene constructs, luciferase and β-galactosidase assays, as well as in silico predictions. Patients with AIP mutations had a lower mean age at diagnosis (23.6±11.2 years) than AIP mutation-negative patients (40.4±14.5 years). A promoter mutation showed reduced in vitro activity corresponding to lower mRNA expression in patient samples. Stimulation of the protein kinase A-pathway positively regulates the AIP promoter. Silent mutations led to abnormal splicing resulting in truncated protein or reduced AIP expression. A two-hybrid assay of protein–protein interaction of all missense variants showed variable disruption of AIP-phosphodiesterase-4A5 binding. In summary, exonic, promoter, splice-site, and large deletion mutations in AIP are implicated in 31% of families in our FIPA cohort. Functional characterization of AIP changes is important to identify the functional impact of gene sequence variants. Hum Mutat 31:1–11, 2010. © 2010 Wiley-Liss, Inc. PMID:20506337

  6. Applying Recovery Biomarkers to Calibrate Self-Report Measures of Energy and Protein in the Hispanic Community Health Study/Study of Latinos.

    PubMed

    Mossavar-Rahmani, Yasmin; Shaw, Pamela A; Wong, William W; Sotres-Alvarez, Daniela; Gellman, Marc D; Van Horn, Linda; Stoutenberg, Mark; Daviglus, Martha L; Wylie-Rosett, Judith; Siega-Riz, Anna Maria; Ou, Fang-Shu; Prentice, Ross L

    2015-06-15

    We investigated measurement error in the self-reported diets of US Hispanics/Latinos, who are prone to obesity and related comorbidities, by background (Central American, Cuban, Dominican, Mexican, Puerto Rican, and South American) in 2010-2012. In 477 participants aged 18-74 years, doubly labeled water and urinary nitrogen were used as objective recovery biomarkers of energy and protein intakes. Self-report was captured from two 24-hour dietary recalls. All measures were repeated in a subsample of 98 individuals. We examined the bias of dietary recalls and their associations with participant characteristics using generalized estimating equations. Energy intake was underestimated by 25.3% (men, 21.8%; women, 27.3%), and protein intake was underestimated by 18.5% (men, 14.7%; women, 20.7%). Protein density was overestimated by 10.7% (men, 11.3%; women, 10.1%). Higher body mass index and Hispanic/Latino background were associated with underestimation of energy (P<0.05). For protein intake, higher body mass index, older age, nonsmoking, Spanish speaking, and Hispanic/Latino background were associated with underestimation (P<0.05). Systematic underreporting of energy and protein intakes and overreporting of protein density were found to vary significantly by Hispanic/Latino background. We developed calibration equations that correct for subject-specific error in reporting that can be used to reduce bias in diet-disease association studies. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.

  7. Protein structure based prediction of catalytic residues

    PubMed Central

    2013-01-01

    Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045

  8. Association of Urinary Biomarkers of Inflammation, Injury, and Fibrosis with Renal Function Decline: The ACCORD Trial

    PubMed Central

    Nadkarni, Girish N.; Rao, Veena; Ismail-Beigi, Faramarz; Fonseca, Vivian A.; Shah, Sudhir V.; Simonson, Michael S.; Cantley, Lloyd; Devarajan, Prasad; Parikh, Chirag R.

    2016-01-01

    Background and objectives Current measures for predicting renal functional decline in patients with type 2 diabetes with preserved renal function are unsatisfactory, and multiple markers assessing various biologic axes may improve prediction. We examined the association of four biomarker-to-creatinine ratio levels (monocyte chemotactic protein-1, IL-18, kidney injury molecule-1, and YKL-40) with renal outcome. Design, setting, participants, & measurements We used a nested case-control design in the Action to Control Cardiovascular Disease Trial by matching 190 participants with ≥40% sustained eGFR decline over the 5-year follow-up period to 190 participants with ≤10% eGFR decline in a 1:1 fashion on key characteristics (age within 5 years, sex, race, baseline albumin-to-creatinine ratio within 20 μg/mg, and baseline eGFR within 10 ml/min per 1.73 m2), with ≤10% decline. We used a Mesoscale Multiplex Platform and measured biomarkers in baseline and 24-month specimens, and we examined biomarker associations with outcome using conditional logistic regression. Results Baseline and 24-month levels of monocyte chemotactic protein-1-to-creatinine ratio levels were higher for cases versus controls. The highest quartile of baseline monocyte chemotactic protein-1-to-creatinine ratio had fivefold greater odds, and each log increment had 2.27-fold higher odds for outcome (odds ratio, 5.27; 95% confidence interval, 2.19 to 12.71 and odds ratio, 2.27; 95% confidence interval, 1.44 to 3.58, respectively). IL-18-to-creatinine ratio, kidney injury molecule-1-to-creatinine ratio, and YKL-40-to-creatinine ratio were not consistently associated with outcome. C statistic for traditional predictors of eGFR decline was 0.70, which improved significantly to 0.74 with monocyte chemotactic protein-1-to-creatinine ratio. Conclusions Urinary monocyte chemotactic protein-1-to-creatinine ratio concentrations were strongly associated with sustained renal decline in patients with type 2 diabetes with preserved renal function. PMID:27189318

  9. The Protein Interactome of Streptococcus pneumoniae and Bacterial Meta-interactomes Improve Function Predictions

    PubMed Central

    Rajagopala, S. V.; Blazie, S. M.; Parrish, J. R.; Khuri, S.; Finley, R. L.

    2017-01-01

    ABSTRACT The functions of roughly a third of all proteins in Streptococcus pneumoniae, a significant human-pathogenic bacterium, are unknown. Using a yeast two-hybrid approach, we have determined more than 2,000 novel protein interactions in this organism. We augmented this network with meta-interactome data that we defined as the pool of all interactions between evolutionarily conserved proteins in other bacteria. We found that such interactions significantly improved our ability to predict a protein’s function, allowing us to provide functional predictions for 299 S. pneumoniae proteins with previously unknown functions. IMPORTANCE Identification of protein interactions in bacterial species can help define the individual roles that proteins play in cellular pathways and pathogenesis. Very few protein interactions have been identified for the important human pathogen S. pneumoniae. We used an experimental approach to identify over 2,000 new protein interactions for S. pneumoniae, the most extensive interactome data for this bacterium to date. To predict protein function, we used our interactome data augmented with interactions from other closely related bacteria. The combination of the experimental data and meta-interactome data significantly improved the prediction results, allowing us to assign possible functions to a large number of poorly characterized proteins. PMID:28744484

  10. Predicting cancer-relevant proteins using an improved molecular similarity ensemble approach.

    PubMed

    Zhou, Bin; Sun, Qi; Kong, De-Xin

    2016-05-31

    In this study, we proposed an improved algorithm for identifying proteins relevant to cancer. The algorithm was named two-layer molecular similarity ensemble approach (TL-SEA). We applied TL-SEA to analyzing the correlation between anticancer compounds (against cell lines K562, MCF7 and A549) and active compounds against separate target proteins listed in BindingDB. Several associations between cancer types and related proteins were revealed using this chemoinformatics approach. An analysis of the literature showed that 26 of 35 predicted proteins were correlated with cancer cell proliferation, apoptosis or differentiation. Additionally, interactions between proteins in BindingDB and anticancer chemicals were also predicted. We discuss the roles of the most important predicted proteins in cancer biology and conclude that TL-SEA could be a useful tool for inferring novel proteins involved in cancer and revealing underlying molecular mechanisms.

  11. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  12. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  13. Great interactions: How binding incorrect partners can teach us about protein recognition and function

    PubMed Central

    Vamparys, Lydie; Laurent, Benoist; Carbone, Alessandra

    2016-01-01

    ABSTRACT Protein–protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross‐docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross‐docking predictions using the area under the specificity‐sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding‐site predictions resulting from the cross‐docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408–1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27287388

  14. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  15. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    PubMed Central

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

    2007-01-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

  17. Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor

    DOE PAGES

    Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...

    2007-11-23

    Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less

  18. MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets.

    PubMed

    Zhu, Hongbo; Pisabarro, M Teresa

    2011-02-01

    Identification of ligand binding pockets on proteins is crucial for the characterization of protein functions. It provides valuable information for protein-ligand docking and rational engineering of small molecules that regulate protein functions. A major number of current prediction algorithms of ligand binding pockets are based on cubic grid representation of proteins and, thus, the results are often protein orientation dependent. We present the MSPocket program for detecting pockets on the solvent excluded surface of proteins. The core algorithm of the MSPocket approach does not use any cubic grid system to represent proteins and is therefore independent of protein orientations. We demonstrate that MSPocket is able to achieve an accuracy of 75% in predicting ligand binding pockets on a test dataset used for evaluating several existing methods. The accuracy is 92% if the top three predictions are considered. Comparison to one of the recently published best performing methods shows that MSPocket reaches similar performance with the additional feature of being protein orientation independent. Interestingly, some of the predictions are different, meaning that the two methods can be considered complementary and combined to achieve better prediction accuracy. MSPocket also provides a graphical user interface for interactive investigation of the predicted ligand binding pockets. In addition, we show that overlap criterion is a better strategy for the evaluation of predicted ligand binding pockets than the single point distance criterion. The MSPocket source code can be downloaded from http://appserver.biotec.tu-dresden.de/MSPocket/. MSPocket is also available as a PyMOL plugin with a graphical user interface.

  19. Predicting Curriculum and Test Performance at Age 7 Years from Pupil Background, Baseline Skills and Phonological Awareness at Age 5

    ERIC Educational Resources Information Center

    Savage, R.; Carless, S.

    2004-01-01

    Background: Phonological awareness tests are known to be amongst the best predictors of literacy; however their predictive validity alongside current school screening practice (baseline assessment, pupil background data) and to National Curricular outcome measures is unknown. Aim: We explored the validity of phonological awareness and orthographic…

  20. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space

    PubMed Central

    2014-01-01

    Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993

  1. A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.

    PubMed

    Habibi, Narjeskhatoon; Mohd Hashim, Siti Z; Norouzi, Alireza; Samian, Mohammed Razip

    2014-05-08

    Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

  2. ProBiS-CHARMMing: Web Interface for Prediction and Optimization of Ligands in Protein Binding Sites.

    PubMed

    Konc, Janez; Miller, Benjamin T; Štular, Tanja; Lešnik, Samo; Woodcock, H Lee; Brooks, Bernard R; Janežič, Dušanka

    2015-11-23

    Proteins often exist only as apo structures (unligated) in the Protein Data Bank, with their corresponding holo structures (with ligands) unavailable. However, apoproteins may not represent the amino-acid residue arrangement upon ligand binding well, which is especially problematic for molecular docking. We developed the ProBiS-CHARMMing web interface by connecting the ProBiS ( http://probis.cmm.ki.si ) and CHARMMing ( http://www.charmming.org ) web servers into one functional unit that enables prediction of protein-ligand complexes and allows for their geometry optimization and interaction energy calculation. The ProBiS web server predicts ligands (small compounds, proteins, nucleic acids, and single-atom ligands) that may bind to a query protein. This is achieved by comparing its surface structure against a nonredundant database of protein structures and finding those that have binding sites similar to that of the query protein. Existing ligands found in the similar binding sites are then transposed to the query according to predictions from ProBiS. The CHARMMing web server enables, among other things, minimization and potential energy calculation for a wide variety of biomolecular systems, and it is used here to optimize the geometry of the predicted protein-ligand complex structures using the CHARMM force field and to calculate their interaction energies with the corresponding query proteins. We show how ProBiS-CHARMMing can be used to predict ligands and their poses for a particular binding site, and minimize the predicted protein-ligand complexes to obtain representations of holoproteins. The ProBiS-CHARMMing web interface is freely available for academic users at http://probis.nih.gov.

  3. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.

    PubMed

    Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe

    2018-02-19

    Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.

  4. Protein subcellular localization prediction using artificial intelligence technology.

    PubMed

    Nair, Rajesh; Rost, Burkhard

    2008-01-01

    Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.

  5. In silico prediction of a disease-associated STIL mutant and its affect on the recruitment of centromere protein J (CENPJ).

    PubMed

    Kumar, Ambuj; Rajendran, Vidya; Sethumadhavan, Rao; Purohit, Rituraj

    2012-01-01

    Human STIL (SCL/TAL1 interrupting locus) protein maintains centriole stability and spindle pole localisation. It helps in recruitment of CENPJ (Centromere protein J)/CPAP (centrosomal P4.1-associated protein) and other centrosomal proteins. Mutations in STIL protein are reported in several disorders, especially in deregulation of cell cycle cascades. In this work, we examined the non-synonymous single nucleotide polymorphisms (nsSNPs) reported in STIL protein for their disease association. Different SNP prediction tools were used to predict disease-associated nsSNPs. Our evaluation technique predicted rs147744459 (R242C) as a highly deleterious disease-associated nsSNP and its interaction behaviour with CENPJ protein. Molecular modelling, docking and molecular dynamics simulation were conducted to examine the structural consequences of the predicted disease-associated mutation. By molecular dynamic simulation we observed structural consequences of R242C mutation which affects interaction of STIL and CENPJ functional domains. The result obtained in this study will provide a biophysical insight into future investigations of pathological nsSNPs using a computational platform.

  6. A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

    PubMed Central

    Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra

    2012-01-01

    Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940

  7. Protein (multi-)location prediction: utilizing interdependencies via a generative model

    PubMed Central

    Shatkay, Hagit

    2015-01-01

    Motivation: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein’s function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. Results: We introduce a probabilistic generative model for protein localization, and develop a system based on it—which we call MDLoc—that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems. Availability and implementation: MDLoc is available at: http://www.eecis.udel.edu/∼compbio/mdloc. Contact: shatkay@udel.edu. PMID:26072505

  8. LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell

    PubMed Central

    Sperschneider, Jana; Catanzariti, Ann-Maree; DeBoer, Kathleen; Petre, Benjamin; Gardiner, Donald M.; Singh, Karam B.; Dodds, Peter N.; Taylor, Jennifer M.

    2017-01-01

    Pathogens secrete effector proteins and many operate inside plant cells to enable infection. Some effectors have been found to enter subcellular compartments by mimicking host targeting sequences. Although many computational methods exist to predict plant protein subcellular localization, they perform poorly for effectors. We introduce LOCALIZER for predicting plant and effector protein localization to chloroplasts, mitochondria, and nuclei. LOCALIZER shows greater prediction accuracy for chloroplast and mitochondrial targeting compared to other methods for 652 plant proteins. For 107 eukaryotic effectors, LOCALIZER outperforms other methods and predicts a previously unrecognized chloroplast transit peptide for the ToxA effector, which we show translocates into tobacco chloroplasts. Secretome-wide predictions and confocal microscopy reveal that rust fungi might have evolved multiple effectors that target chloroplasts or nuclei. LOCALIZER is the first method for predicting effector localisation in plants and is a valuable tool for prioritizing effector candidates for functional investigations. LOCALIZER is available at http://localizer.csiro.au/. PMID:28300209

  9. A large-scale evaluation of computational protein function prediction

    PubMed Central

    Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650

  10. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.

    PubMed

    Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin

    2016-06-07

    RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .

  11. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

    PubMed Central

    Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin

    2015-01-01

    Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435

  12. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  13. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  14. Proteome-wide Prediction of Self-interacting Proteins Based on Multiple Properties*

    PubMed Central

    Liu, Zhongyang; Guo, Feifei; Zhang, Jiyang; Wang, Jian; Lu, Liang; Li, Dong; He, Fuchu

    2013-01-01

    Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This work helps us understand the role self-interacting proteins play in cellular functions from an overall perspective, and the constructed prediction model may contribute to the high throughput finding of self-interacting proteins and provide clues for elucidating their functions. PMID:23422585

  15. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins.

    PubMed

    Karp, Jerome M; Eryilmaz, Ertan; Erylimaz, Ertan; Cowburn, David

    2015-01-01

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.

  16. The extracellular Leucine-Rich Repeat superfamily; a comparative survey and analysis of evolutionary relationships and expression patterns

    PubMed Central

    Dolan, Jackie; Walshe, Karen; Alsbury, Samantha; Hokamp, Karsten; O'Keeffe, Sean; Okafuji, Tatsuya; Miller, Suzanne FC; Tear, Guy; Mitchell, Kevin J

    2007-01-01

    Background Leucine-rich repeats (LRRs) are highly versatile and evolvable protein-ligand interaction motifs found in a large number of proteins with diverse functions, including innate immunity and nervous system development. Here we catalogue all of the extracellular LRR (eLRR) proteins in worms, flies, mice and humans. We use convergent evidence from several transmembrane-prediction and motif-detection programs, including a customised algorithm, LRRscan, to identify eLRR proteins, and a hierarchical clustering method based on TribeMCL to establish their evolutionary relationships. Results This yields a total of 369 proteins (29 in worm, 66 in fly, 135 in mouse and 139 in human), many of them of unknown function. We group eLRR proteins into several classes: those with only LRRs, those that cluster with Toll-like receptors (Tlrs), those with immunoglobulin or fibronectin-type 3 (FN3) domains and those with some other domain. These groups show differential patterns of expansion and diversification across species. Our analyses reveal several clusters of novel genes, including two Elfn genes, encoding transmembrane proteins with eLRRs and an FN3 domain, and six genes encoding transmembrane proteins with eLRRs only (the Elron cluster). Many of these are expressed in discrete patterns in the developing mouse brain, notably in the thalamus and cortex. We have also identified a number of novel fly eLRR proteins with discrete expression in the embryonic nervous system. Conclusion This study provides the necessary foundation for a systematic analysis of the functions of this class of genes, which are likely to include prominently innate immunity, inflammation and neural development, especially the specification of neuronal connectivity. PMID:17868438

  17. Structural domains and main-chain flexibility in prion proteins.

    PubMed

    Blinov, N; Berjanskii, M; Wishart, D S; Stepanova, M

    2009-02-24

    In this study we describe a novel approach to define structural domains and to characterize the local flexibility in both human and chicken prion proteins. The approach we use is based on a comprehensive theory of collective dynamics in proteins that was recently developed. This method determines the essential collective coordinates, which can be found from molecular dynamics trajectories via principal component analysis. Under this particular framework, we are able to identify the domains where atoms move coherently while at the same time to determine the local main-chain flexibility for each residue. We have verified this approach by comparing our results for the predicted dynamic domain systems with the computed main-chain flexibility profiles and the NMR-derived random coil indexes for human and chicken prion proteins. The three sets of data show excellent agreement. Additionally, we demonstrate that the dynamic domains calculated in this fashion provide a highly sensitive measure of protein collective structure and dynamics. Furthermore, such an analysis is capable of revealing structural and dynamic properties of proteins that are inaccessible to the conventional assessment of secondary structure. Using the collective dynamic simulation approach described here along with a high-temperature simulations of unfolding of human prion protein, we have explored whether locations of relatively low stability could be identified where the unfolding process could potentially be facilitated. According to our analysis, the locations of relatively low stability may be associated with the beta-sheet formed by strands S1 and S2 and the adjacent loops, whereas helix HC appears to be a relatively stable part of the protein. We suggest that this kind of structural analysis may provide a useful background for a more quantitative assessment of potential routes of spontaneous misfolding in prion proteins.

  18. Graphene as a protein crystal mounting material to reduce background scatter.

    PubMed

    Wierman, Jennifer L; Alden, Jonathan S; Kim, Chae Un; McEuen, Paul L; Gruner, Sol M

    2013-10-01

    The overall signal-to-noise ratio per unit dose for X-ray diffraction data from protein crystals can be improved by reducing the mass and density of all material surrounding the crystals. This article demonstrates a path towards the practical ultimate in background reduction by use of atomically thin graphene sheets as a crystal mounting platform for protein crystals. The results show the potential for graphene in protein crystallography and other cases where X-ray scatter from the mounting material must be reduced and specimen dehydration prevented, such as in coherent X-ray diffraction imaging of microscopic objects.

  19. Graphene as a protein crystal mounting material to reduce background scatter

    PubMed Central

    Wierman, Jennifer L.; Alden, Jonathan S.; Kim, Chae Un; McEuen, Paul L.; Gruner, Sol M.

    2013-01-01

    The overall signal-to-noise ratio per unit dose for X-ray diffraction data from protein crystals can be improved by reducing the mass and density of all material surrounding the crystals. This article demonstrates a path towards the practical ultimate in background reduction by use of atomically thin graphene sheets as a crystal mounting platform for protein crystals. The results show the potential for graphene in protein crystallography and other cases where X-ray scatter from the mounting material must be reduced and specimen dehydration prevented, such as in coherent X-ray diffraction imaging of microscopic objects. PMID:24068843

  20. Protein function prediction using neighbor relativity in protein-protein interaction network.

    PubMed

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Functional classification of protein structures by local structure matching in graph representation.

    PubMed

    Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

    2018-03-31

    As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  2. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  3. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  4. The cosmic microwave background radiation

    NASA Technical Reports Server (NTRS)

    Silk, Joseph

    1992-01-01

    A review the implications of the spectrum and anisotropy of the cosmic microwave background for cosmology. Thermalization and processes generating spectral distortions are discussed. Anisotropy predictions are described and compared with observational constraints. If the evidence for large-scale power in the galaxy distribution in excess of that predicted by the cold dark matter model is vindicated, and the observed structure originated via gravitational instabilities of primordial density fluctuations, the predicted amplitude of microwave background anisotropies on angular scales of a degree and larger must be at least several parts in 10 exp 6.

  5. LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information.

    PubMed

    Zahiri, Javad; Mohammad-Noori, Morteza; Ebrahimpour, Reza; Saadat, Samaneh; Bozorgmehr, Joseph H; Goldberg, Tatyana; Masoudi-Nejad, Ali

    2014-12-01

    Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  7. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  8. Predicting Protein–protein Association Rates using Coarse-grained Simulation and Machine Learning

    PubMed Central

    Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

    2017-01-01

    Protein–protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate. PMID:28418043

  9. MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins

    PubMed Central

    Li, Hui; Wang, Rong; Gan, Yong

    2017-01-01

    Apoptosis proteins play an important role in the mechanism of programmed cell death. Predicting subcellular localization of apoptosis proteins is an essential step to understand their functions and identify drugs target. Many computational prediction methods have been developed for apoptosis protein subcellular localization. However, these existing works only focus on the proteins that have one location; proteins with multiple locations are either not considered or assumed as not existing when constructing prediction models, so that they cannot completely predict all the locations of the apoptosis proteins with multiple locations. To address this problem, this paper proposes a novel multilabel predictor named MultiP-Apo, which can predict not only apoptosis proteins with single subcellular location but also those with multiple subcellular locations. Specifically, given a query protein, GO-based feature extraction method is used to extract its feature vector. Subsequently, the GO feature vector is classified by a new multilabel classifier based on the label-specific features. It is the first multilabel predictor ever established for identifying subcellular locations of multilocation apoptosis proteins. As an initial study, MultiP-Apo achieves an overall accuracy of 58.49% by jackknife test, which indicates that our proposed predictor may become a very useful high-throughput tool in this area. PMID:28744305

  10. Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

    PubMed Central

    Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong

    2012-01-01

    Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126

  11. MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins.

    PubMed

    Wang, Xiao; Li, Hui; Wang, Rong; Zhang, Qiuwen; Zhang, Weiwei; Gan, Yong

    2017-01-01

    Apoptosis proteins play an important role in the mechanism of programmed cell death. Predicting subcellular localization of apoptosis proteins is an essential step to understand their functions and identify drugs target. Many computational prediction methods have been developed for apoptosis protein subcellular localization. However, these existing works only focus on the proteins that have one location; proteins with multiple locations are either not considered or assumed as not existing when constructing prediction models, so that they cannot completely predict all the locations of the apoptosis proteins with multiple locations. To address this problem, this paper proposes a novel multilabel predictor named MultiP-Apo, which can predict not only apoptosis proteins with single subcellular location but also those with multiple subcellular locations. Specifically, given a query protein, GO-based feature extraction method is used to extract its feature vector. Subsequently, the GO feature vector is classified by a new multilabel classifier based on the label-specific features. It is the first multilabel predictor ever established for identifying subcellular locations of multilocation apoptosis proteins. As an initial study, MultiP-Apo achieves an overall accuracy of 58.49% by jackknife test, which indicates that our proposed predictor may become a very useful high-throughput tool in this area.

  12. Identification of Surprisingly Diverse Type IV Pili, across a Broad Range of Gram-Positive Bacteria

    PubMed Central

    Roos, David S.; Pohlschröder, Mechthild

    2011-01-01

    Background In Gram-negative bacteria, type IV pili (TFP) have long been known to play important roles in such diverse biological phenomena as surface adhesion, motility, and DNA transfer, with significant consequences for pathogenicity. More recently it became apparent that Gram-positive bacteria also express type IV pili; however, little is known about the diversity and abundance of these structures in Gram-positives. Computational tools for automated identification of type IV pilins are not currently available. Results To assess TFP diversity in Gram-positive bacteria and facilitate pilin identification, we compiled a comprehensive list of putative Gram-positive pilins encoded by operons containing highly conserved pilus biosynthetic genes (pilB, pilC). A surprisingly large number of species were found to contain multiple TFP operons (pil, com and/or tad). The N-terminal sequences of predicted pilins were exploited to develop PilFind, a rule-based algorithm for genome-wide identification of otherwise poorly conserved type IV pilins in any species, regardless of their association with TFP biosynthetic operons (http://signalfind.org). Using PilFind to scan 53 Gram-positive genomes (encoding >187,000 proteins), we identified 286 candidate pilins, including 214 in operons containing TFP biosynthetic genes (TBG+ operons). Although trained on Gram-positive pilins, PilFind identified 55 of 58 manually curated Gram-negative pilins in TBG+ operons, as well as 53 additional pilin candidates in operons lacking biosynthetic genes in ten species (>38,000 proteins), including 27 of 29 experimentally verified pilins. False positive rates appear to be low, as PilFind predicted only four pilin candidates in eleven bacterial species (>13,000 proteins) lacking TFP biosynthetic genes. Conclusions We have shown that Gram-positive bacteria contain a highly diverse set of type IV pili. PilFind can be an invaluable tool to study bacterial cellular processes known to involve type IV pilus-like structures. Its use in combination with other currently available computational tools should improve the accuracy of predicting the subcellular localization of bacterial proteins. PMID:22216142

  13. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  14. In silico platform for predicting and initiating β-turns in a protein at desired locations.

    PubMed

    Singh, Harinder; Singh, Sandeep; Raghava, Gajendra P S

    2015-05-01

    Numerous studies have been performed for analysis and prediction of β-turns in a protein. This study focuses on analyzing, predicting, and designing of β-turns to understand the preference of amino acids in β-turn formation. We analyzed around 20,000 PDB chains to understand the preference of residues or pair of residues at different positions in β-turns. Based on the results, a propensity-based method has been developed for predicting β-turns with an accuracy of 82%. We introduced a new approach entitled "Turn level prediction method," which predicts the complete β-turn rather than focusing on the residues in a β-turn. Finally, we developed BetaTPred3, a Random forest based method for predicting β-turns by utilizing various features of four residues present in β-turns. The BetaTPred3 achieved an accuracy of 79% with 0.51 MCC that is comparable or better than existing methods on BT426 dataset. Additionally, models were developed to predict β-turn types with better performance than other methods available in the literature. In order to improve the quality of prediction of turns, we developed prediction models on a large and latest dataset of 6376 nonredundant protein chains. Based on this study, a web server has been developed for prediction of β-turns and their types in proteins. This web server also predicts minimum number of mutations required to initiate or break a β-turn in a protein at specified location of a protein. © 2015 Wiley Periodicals, Inc.

  15. MQAPRank: improved global protein model quality assessment by learning-to-rank.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen

    2017-05-25

    Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

  16. Protein contact prediction using patterns of correlation.

    PubMed

    Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas

    2004-09-01

    We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.

  17. A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function

    PubMed Central

    Lai, Jason; Jin, Jing; Kubelka, Jan; Liberles, David A.

    2012-01-01

    Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that the functional evolution can be inferred from the changes in the protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced Cα representation of the protein structure while enzymatic function is described by Enzyme Commission (EC) numbers. Similarity of the binding pocket dynamics at each branch of the protein family’s phylogeny was analyzed in two ways: 1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and 2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the alpha-amylase, D-isomer specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal modes analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity. PMID:22651983

  18. A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function.

    PubMed

    Lai, Jason; Jin, Jing; Kubelka, Jan; Liberles, David A

    2012-09-21

    Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that functional evolution can be inferred from the changes in protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced C(α) representation of the protein structure while enzymatic function is described by Enzyme Commission numbers. Similarity of the binding pocket dynamics at each branch of the protein family's phylogeny was analyzed in two ways: (1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and (2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the α-amylase, D-isomer-specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal mode analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Comparison of molecular dynamics and superfamily spaces of protein domain deformation

    PubMed Central

    Velázquez-Muriel, Javier A; Rueda, Manuel; Cuesta, Isabel; Pascual-Montano, Alberto; Orozco, Modesto; Carazo, José-María

    2009-01-01

    Background It is well known the strong relationship between protein structure and flexibility, on one hand, and biological protein function, on the other hand. Technically, protein flexibility exploration is an essential task in many applications, such as protein structure prediction and modeling. In this contribution we have compared two different approaches to explore the flexibility space of protein domains: i) molecular dynamics (MD-space), and ii) the study of the structural changes within superfamily (SF-space). Results Our analysis indicates that the MD-space and the SF-space display a significant overlap, but are still different enough to be considered as complementary. The SF-space space is wider but less complex than the MD-space, irrespective of the number of members in the superfamily. Also, the SF-space does not sample all possibilities offered by the MD-space, but often introduces very large changes along just a few deformation modes, whose number tend to a plateau as the number of related folds in the superfamily increases. Conclusion Theoretically, we obtained two conclusions. First, that function restricts the access to some flexibility patterns to evolution, as we observe that when a superfamily member changes to become another, the path does not completely overlap with the physical deformability. Second, that conformational changes from variation in a superfamily are larger and much simpler than those allowed by physical deformability. Methodologically, the conclusion is that both spaces studied are complementary, and have different size and complexity. We expect this fact to have application in fields as 3D-EM/X-ray hybrid models or ab initio protein folding. PMID:19220918

  20. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529

  1. The Homeobox Protein CEH-23 Mediates Prolonged Longevity in Response to Impaired Mitochondrial Electron Transport Chain in C. elegans

    PubMed Central

    Walter, Ludivine; Baruah, Aiswarya; Chang, Hsin-Wen; Pace, Heather Mae; Lee, Siu Sylvia

    2011-01-01

    Recent findings indicate that perturbations of the mitochondrial electron transport chain (METC) can cause extended longevity in evolutionarily diverse organisms. To uncover the molecular basis of how altered METC increases lifespan in C. elegans, we performed an RNAi screen and revealed that three predicted transcription factors are specifically required for the extended longevity of mitochondrial mutants. In particular, we demonstrated that the nuclear homeobox protein CEH-23 uniquely mediates the longevity but not the slow development, reduced brood size, or resistance to oxidative stress associated with mitochondrial mutations. Furthermore, we showed that ceh-23 expression levels are responsive to altered METC, and enforced overexpression of ceh-23 is sufficient to extend lifespan in wild-type background. Our data point to mitochondria-to-nucleus communications to be key for longevity determination and highlight CEH-23 as a novel longevity factor capable of responding to mitochondrial perturbations. These findings provide a new paradigm for how mitochondria impact aging and age-dependent diseases. PMID:21713031

  2. Systems analysis of the single photon response in invertebrate photoreceptors.

    PubMed

    Pumir, Alain; Graves, Jennifer; Ranganathan, Rama; Shraiman, Boris I

    2008-07-29

    Photoreceptors of Drosophila compound eye employ a G protein-mediated signaling pathway that transduces single photons into transient electrical responses called "quantum bumps" (QB). Although most of the molecular components of this pathway are already known, the system-level understanding of the mechanism of QB generation has remained elusive. Here, we present a quantitative model explaining how QBs emerge from stochastic nonlinear dynamics of the signaling cascade. The model shows that the cascade acts as an "integrate and fire" device and explains how photoreceptors achieve reliable responses to light although keeping low background in the dark. The model predicts the nontrivial behavior of mutants that enhance or suppress signaling and explains the dependence on external calcium, which controls feedback regulation. The results provide insight into physiological questions such as single-photon response efficiency and the adaptation of response to high incident-light level. The system-level analysis enabled by modeling phototransduction provides a foundation for understanding G protein signaling pathways less amenable to quantitative approaches.

  3. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  4. Using support vector machine to predict beta- and gamma-turns in proteins.

    PubMed

    Hu, Xiuzhen; Li, Qianzhong

    2008-09-01

    By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.

  5. Predicting protein interactions by Brownian dynamics simulations.

    PubMed

    Meng, Xuan-Yu; Xu, Yu; Zhang, Hong-Xing; Mezei, Mihaly; Cui, Meng

    2012-01-01

    We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.

  6. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize

    PubMed Central

    Musungu, Bryan; Bhatnagar, Deepak; Brown, Robert L.; Fakhoury, Ahmad M.; Geisler, Matt

    2015-01-01

    Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize. PMID:26089837

  7. A protein block based fold recognition method for the annotation of twilight zone sequences.

    PubMed

    Suresh, V; Ganesan, K; Parthasarathy, S

    2013-03-01

    The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.

  8. Hot-spot identification on a broad class of proteins and RNA suggest unifying principles of molecular recognition

    PubMed Central

    Kulp, John L.; Cloudsdale, Ian S.; Kulp, John L.

    2017-01-01

    Chemically diverse fragments tend to collectively bind at localized sites on proteins, which is a cornerstone of fragment-based techniques. A central question is how general are these strategies for predicting a wide variety of molecular interactions such as small molecule-protein, protein-protein and protein-nucleic acid for both experimental and computational methods. To address this issue, we recently proposed three governing principles, (1) accurate prediction of fragment-macromolecule binding free energy, (2) accurate prediction of water-macromolecule binding free energy, and (3) locating sites on a macromolecule that have high affinity for a diversity of fragments and low affinity for water. To test the generality of these concepts we used the computational technique of Simulated Annealing of Chemical Potential to design one small fragment to break the RecA-RecA protein-protein interaction and three fragments that inhibit peptide-deformylase via water-mediated multi-body interactions. Experiments confirm the predictions that 6-hydroxydopamine potently inhibits RecA and that PDF inhibition quantitatively tracks the water-mediated binding predictions. Additionally, the principles correctly predict the essential bound waters in HIV Protease, the surprisingly extensive binding site of elastase, the pinpoint location of electron transfer in dihydrofolate reductase, the HIV TAT-TAR protein-RNA interactions, and the MDM2-MDM4 differential binding to p53. The experimental confirmations of highly non-obvious predictions combined with the precise characterization of a broad range of known phenomena lend strong support to the generality of fragment-based methods for characterizing molecular recognition. PMID:28837642

  9. Hot-spot identification on a broad class of proteins and RNA suggest unifying principles of molecular recognition.

    PubMed

    Kulp, John L; Cloudsdale, Ian S; Kulp, John L; Guarnieri, Frank

    2017-01-01

    Chemically diverse fragments tend to collectively bind at localized sites on proteins, which is a cornerstone of fragment-based techniques. A central question is how general are these strategies for predicting a wide variety of molecular interactions such as small molecule-protein, protein-protein and protein-nucleic acid for both experimental and computational methods. To address this issue, we recently proposed three governing principles, (1) accurate prediction of fragment-macromolecule binding free energy, (2) accurate prediction of water-macromolecule binding free energy, and (3) locating sites on a macromolecule that have high affinity for a diversity of fragments and low affinity for water. To test the generality of these concepts we used the computational technique of Simulated Annealing of Chemical Potential to design one small fragment to break the RecA-RecA protein-protein interaction and three fragments that inhibit peptide-deformylase via water-mediated multi-body interactions. Experiments confirm the predictions that 6-hydroxydopamine potently inhibits RecA and that PDF inhibition quantitatively tracks the water-mediated binding predictions. Additionally, the principles correctly predict the essential bound waters in HIV Protease, the surprisingly extensive binding site of elastase, the pinpoint location of electron transfer in dihydrofolate reductase, the HIV TAT-TAR protein-RNA interactions, and the MDM2-MDM4 differential binding to p53. The experimental confirmations of highly non-obvious predictions combined with the precise characterization of a broad range of known phenomena lend strong support to the generality of fragment-based methods for characterizing molecular recognition.

  10. A novel representation for apoptosis protein subcellular localization prediction using support vector machine.

    PubMed

    Zhang, Li; Liao, Bo; Li, Dachao; Zhu, Wen

    2009-07-21

    Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.

  11. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    PubMed Central

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim; Krogsgaard, Steen; Nielsen, Jens

    2008-01-01

    Background Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST) library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading to a genome scale metabolic model of A. oryzae. Results Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1,846 (1,053 unique) biochemical reactions. The metabolic reactions are compartmentalized into the cytosol, the mitochondria, the peroxisome and the extracellular space. Transport steps between the compartments and the extracellular space represent 281 reactions, of which 161 are unique. The metabolic model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion A much enhanced annotation of the A. oryzae genome was performed and a genome-scale metabolic model of A. oryzae was reconstructed. The model accurately predicted the growth and biomass yield on different carbon sources. The model serves as an important resource for gaining further insight into our understanding of A. oryzae physiology. PMID:18500999

  12. Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

    PubMed Central

    2010-01-01

    Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the T3SS/T4SS in bacteria. Our predicted effectors provide useful hypotheses for further studies. PMID:21143776

  13. Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

    PubMed Central

    2010-01-01

    Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194

  14. UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks.

    PubMed

    Peng, Wei; Wang, Jianxin; Cheng, Yingjiao; Lu, Yu; Wu, Fangxiang; Pan, Yi

    2015-01-01

    Prediction of essential proteins which are crucial to an organism's survival is important for disease analysis and drug design, as well as the understanding of cellular life. The majority of prediction methods infer the possibility of proteins to be essential by using the network topology. However, these methods are limited to the completeness of available protein-protein interaction (PPI) data and depend on the network accuracy. To overcome these limitations, some computational methods have been proposed. However, seldom of them solve this problem by taking consideration of protein domains. In this work, we first analyze the correlation between the essentiality of proteins and their domain features based on data of 13 species. We find that the proteins containing more protein domain types which rarely occur in other proteins tend to be essential. Accordingly, we propose a new prediction method, named UDoNC, by combining the domain features of proteins with their topological properties in PPI network. In UDoNC, the essentiality of proteins is decided by the number and the frequency of their protein domain types, as well as the essentiality of their adjacent edges measured by edge clustering coefficient. The experimental results on S. cerevisiae data show that UDoNC outperforms other existing methods in terms of area under the curve (AUC). Additionally, UDoNC can also perform well in predicting essential proteins on data of E. coli.

  15. PconsFold: improved contact predictions improve protein models.

    PubMed

    Michel, Mirco; Hayat, Sikander; Skwark, Marcin J; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2014-09-01

    Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  16. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  17. Search for Functional Flexible Regions in the G-protein Family: New Reading of the FoldUnfold Program.

    PubMed

    Galzitskaya, Oxana; Deryusheva, Eugenia; Machulin, Andrey; Nemashkalova, Ekaterina; Glyakina, Anna

    2018-06-21

    High prediction accuracy of flexible loops in different protein families is a challenge because of the crucial functions associated with these regions. Results of the currently available programs for prediction of loops vary from protein to protein. For prediction of flexible regions in the G-domain for 23 representatives of G-proteins with the known 3D structure we have used eight programs. The results of predictions demonstrate that the FoldUnfold program predicts better loop positions than the PONDR, RОNN, DisEMBL, IUPred, GlobPlot 2, FoldIndex, and MobiDB programs. When classifying the predicted loops (rigid/flexible) according to the Debye-Waller fluctuation factors, our data reveal the existing weak correlation between the B-factors and the average number of closed residues according to the FoldUnfold program; the percentage of overlapping characteristics (residue fold/unfold status) of the protein residues from the two methods is about 60-70%. According to the FoldUnfold program, for G-proteins with the posttranslational modifications, the surrounding binding site residues by disordered-promoting glycine and alanine residues conduces to a more flexible position of the binding sites for fatty acid, while methionine, cysteine and isoleucine residues provide more rigid binding sites. Thus, our research demonstrates additional possibilities of the FoldUnfold program for prediction of flexible regions and characteristics of individual residues in a different protein family. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  19. M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations

    PubMed Central

    2013-01-01

    Background Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale. Results We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters. Conclusions M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder PMID:24565382

  20. An Improved Method of Predicting Extinction Coefficients for the Determination of Protein Concentration.

    PubMed

    Hilario, Eric C; Stern, Alan; Wang, Charlie H; Vargas, Yenny W; Morgan, Charles J; Swartz, Trevor E; Patapoff, Thomas W

    2017-01-01

    Concentration determination is an important method of protein characterization required in the development of protein therapeutics. There are many known methods for determining the concentration of a protein solution, but the easiest to implement in a manufacturing setting is absorption spectroscopy in the ultraviolet region. For typical proteins composed of the standard amino acids, absorption at wavelengths near 280 nm is due to the three amino acid chromophores tryptophan, tyrosine, and phenylalanine in addition to a contribution from disulfide bonds. According to the Beer-Lambert law, absorbance is proportional to concentration and path length, with the proportionality constant being the extinction coefficient. Typically the extinction coefficient of proteins is experimentally determined by measuring a solution absorbance then experimentally determining the concentration, a measurement with some inherent variability depending on the method used. In this study, extinction coefficients were calculated based on the measured absorbance of model compounds of the four amino acid chromophores. These calculated values for an unfolded protein were then compared with an experimental concentration determination based on enzymatic digestion of proteins. The experimentally determined extinction coefficient for the native proteins was consistently found to be 1.05 times the calculated value for the unfolded proteins for a wide range of proteins with good accuracy and precision under well-controlled experimental conditions. The value of 1.05 times the calculated value was termed the predicted extinction coefficient. Statistical analysis shows that the differences between predicted and experimentally determined coefficients are scattered randomly, indicating no systematic bias between the values among the proteins measured. The predicted extinction coefficient was found to be accurate and not subject to the inherent variability of experimental methods. We propose the use of a predicted extinction coefficient for determining the protein concentration of therapeutic proteins starting from early development through the lifecycle of the product. LAY ABSTRACT: Knowing the concentration of a protein in a pharmaceutical solution is important to the drug's development and posology. There are many ways to determine the concentration, but the easiest one to use in a testing lab employs absorption spectroscopy. Absorbance of ultraviolet light by a protein solution is proportional to its concentration and path length; the proportionality constant is the extinction coefficient. The extinction coefficient of a protein therapeutic is usually determined experimentally during early product development and has some inherent method variability. In this study, extinction coefficients of several proteins were calculated based on the measured absorbance of model compounds. These calculated values for an unfolded protein were then compared with experimental concentration determinations based on enzymatic digestion of the proteins. The experimentally determined extinction coefficient for the native protein was 1.05 times the calculated value for the unfolded protein with good accuracy and precision under controlled experimental conditions, so the value of 1.05 times the calculated coefficient was called the predicted extinction coefficient. Comparison of predicted and measured extinction coefficients indicated that the predicted value was very close to the experimentally determined values for the proteins. The predicted extinction coefficient was accurate and removed the variability inherent in experimental methods. © PDA, Inc. 2017.

Top