Science.gov

Sample records for accurately predict protein

  1. Accurate Prediction of Docked Protein Structure Similarity.

    PubMed

    Akbal-Delibas, Bahar; Pomplun, Marc; Haspel, Nurit

    2015-09-01

    One of the major challenges for protein-protein docking methods is to accurately discriminate nativelike structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm. Our method achieved predicting RMSDs of unbound docked complexes with 0.4Å error margin. PMID:26335807

  2. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions.

    PubMed

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  3. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    PubMed Central

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  4. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  5. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

    PubMed Central

    Burger, Lukas; van Nimwegen, Erik

    2008-01-01

    Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381

  6. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.

    PubMed

    Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

    2013-01-01

    Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp. PMID:24376713

  7. NMRDSP: An Accurate Prediction of Protein Shape Strings from NMR Chemical Shifts and Sequence Data

    PubMed Central

    Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

    2013-01-01

    Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp. PMID:24376713

  8. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. PMID:25979264

  9. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    SciTech Connect

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  10. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGESBeta

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  11. Accurate prediction of helix interactions and residue contacts in membrane proteins.

    PubMed

    Hönigschmid, Peter; Frishman, Dmitrij

    2016-04-01

    Accurate prediction of intra-molecular interactions from amino acid sequence is an important pre-requisite for obtaining high-quality protein models. Over the recent years, remarkable progress in this area has been achieved through the application of novel co-variation algorithms, which eliminate transitive evolutionary connections between residues. In this work we present a new contact prediction method for α-helical transmembrane proteins, MemConP, in which evolutionary couplings are combined with a machine learning approach. MemConP achieves a substantially improved accuracy (precision: 56.0%, recall: 17.5%, MCC: 0.288) compared to the use of either machine learning or co-evolution methods alone. The method also achieves 91.4% precision, 42.1% recall and a MCC of 0.490 in predicting helix-helix interactions based on predicted contacts. The approach was trained and rigorously benchmarked by cross-validation and independent testing on up-to-date non-redundant datasets of 90 and 30 experimental three dimensional structures, respectively. MemConP is a standalone tool that can be downloaded together with the associated training data from http://webclu.bio.wzw.tum.de/MemConP. PMID:26851352

  12. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein

  13. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

    PubMed Central

    EL-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein

  14. Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction

    PubMed Central

    Braun, Tatjana; Koehler Leman, Julia; Lange, Oliver F.

    2015-01-01

    Recent work has shown that the accuracy of ab initio structure prediction can be significantly improved by integrating evolutionary information in form of intra-protein residue-residue contacts. Following this seminal result, much effort is put into the improvement of contact predictions. However, there is also a substantial need to develop structure prediction protocols tailored to the type of restraints gained by contact predictions. Here, we present a structure prediction protocol that combines evolutionary information with the resolution-adapted structural recombination approach of Rosetta, called RASREC. Compared to the classic Rosetta ab initio protocol, RASREC achieves improved sampling, better convergence and higher robustness against incorrect distance restraints, making it the ideal sampling strategy for the stated problem. To demonstrate the accuracy of our protocol, we tested the approach on a diverse set of 28 globular proteins. Our method is able to converge for 26 out of the 28 targets and improves the average TM-score of the entire benchmark set from 0.55 to 0.72 when compared to the top ranked models obtained by the EVFold web server using identical contact predictions. Using a smaller benchmark, we furthermore show that the prediction accuracy of our method is only slightly reduced when the contact prediction accuracy is comparatively low. This observation is of special interest for protein sequences that only have a limited number of homologs. PMID:26713437

  15. DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces

    PubMed Central

    Tjong, Harianto; Zhou, Huan-Xiang

    2007-01-01

    Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-targeting proteins. Characteristics of such binding sites may form the basis for predicting DNA-binding sites from the structures of proteins alone. Such an approach has been successfully developed for predicting protein–protein interface. Here this approach is adapted for predicting DNA-binding sites. We used a representative set of 264 protein–DNA complexes from the Protein Data Bank to analyze characteristics and to train and test a neural network predictor of DNA-binding sites. The input to the predictor consisted of PSI-blast sequence profiles and solvent accessibilities of each surface residue and 14 of its closest neighboring residues. Predicted DNA-contacting residues cover 60% of actual DNA-contacting residues and have an accuracy of 76%. This method significantly outperforms previous attempts of DNA-binding site predictions. Its application to the prion protein yielded a DNA-binding site that is consistent with recent NMR chemical shift perturbation data, suggesting that it can complement experimental techniques in characterizing protein–DNA interfaces. PMID:17284455

  16. LOCUSTRA: accurate prediction of local protein structure using a two-layer support vector machine approach.

    PubMed

    Zimmermann, Olav; Hansmann, Ulrich H E

    2008-09-01

    Constraint generation for 3d structure prediction and structure-based database searches benefit from fine-grained prediction of local structure. In this work, we present LOCUSTRA, a novel scheme for the multiclass prediction of local structure that uses two layers of support vector machines (SVM). Using a 16-letter structural alphabet from de Brevern et al. (Proteins: Struct., Funct., Bioinf. 2000, 41, 271-287), we assess its prediction ability for an independent test set of 222 proteins and compare our method to three-class secondary structure prediction and direct prediction of dihedral angles. The prediction accuracy is Q16=61.0% for the 16 classes of the structural alphabet and Q3=79.2% for a simple mapping to the three secondary classes helix, sheet, and coil. We achieve a mean phi(psi) error of 24.74 degrees (38.35 degrees) and a median RMSDA (root-mean-square deviation of the (dihedral) angles) per protein chain of 52.1 degrees. These results compare favorably with related approaches. The LOCUSTRA web server is freely available to researchers at http://www.fz-juelich.de/nic/cbb/service/service.php. PMID:18763837

  17. Accurate microRNA target prediction correlates with protein repression levels

    PubMed Central

    Maragkakis, Manolis; Alexiou, Panagiotis; Papadopoulos, Giorgio L; Reczko, Martin; Dalamagas, Theodore; Giannopoulos, George; Goumas, George; Koukis, Evangelos; Kourtis, Kornilios; Simossis, Victor A; Sethupathy, Praveen; Vergoulis, Thanasis; Koziris, Nectarios; Sellis, Timos; Tsanakas, Panagiotis; Hatzigeorgiou, Artemis G

    2009-01-01

    Background MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and disease. Results DIANA-microT 3.0 is an algorithm for microRNA target prediction which is based on several parameters calculated individually for each microRNA and combines conserved and non-conserved microRNA recognition elements into a final prediction score, which correlates with protein production fold change. Specifically, for each predicted interaction the program reports a signal to noise ratio and a precision score which can be used as an indication of the false positive rate of the prediction. Conclusion Recently, several computational target prediction programs were benchmarked based on a set of microRNA target genes identified by the pSILAC method. In this assessment DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%. The DIANA-microT 3.0 prediction results are available online in a user friendly web server at PMID:19765283

  18. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    NASA Astrophysics Data System (ADS)

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-02-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process.

  19. Accurate prediction of interfacial residues in two-domain proteins using evolutionary information: implications for three-dimensional modeling.

    PubMed

    Bhaskara, Ramachandra M; Padhi, Amrita; Srinivasan, Narayanaswamy

    2014-07-01

    With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naïve Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (∼85%) and specific (∼95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. PMID:24375512

  20. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

    PubMed Central

    Victora, Andrea; Möller, Heiko M.; Exner, Thomas E.

    2014-01-01

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  1. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes.

    PubMed

    Victora, Andrea; Möller, Heiko M; Exner, Thomas E

    2014-12-16

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3-0.6 ppm and correlation coefficients (r(2)) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  2. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach

    PubMed Central

    Wang, Zhiheng; Yang, Qianqian; Li, Tonghua; Cong, Peisheng

    2015-01-01

    The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database. Availability The DisoMCS is available at http://cal.tongji.edu.cn/disorder/. PMID:26090958

  3. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding.

    PubMed

    Nissley, Daniel A; Sharma, Ajeet K; Ahmed, Nabeel; Friedrich, Ulrike A; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P

    2016-01-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592

  4. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    PubMed Central

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-01-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally—a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592

  5. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space.

    PubMed

    Fromer, Menachem; Yanover, Chen

    2009-05-15

    precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design. PMID:19003998

  6. Mathematical model accurately predicts protein release from an affinity-based delivery system.

    PubMed

    Vulic, Katarina; Pakulska, Malgosia M; Sonthalia, Rohit; Ramachandran, Arun; Shoichet, Molly S

    2015-01-10

    Affinity-based controlled release modulates the delivery of protein or small molecule therapeutics through transient dissociation/association. To understand which parameters can be used to tune release, we used a mathematical model based on simple binding kinetics. A comprehensive asymptotic analysis revealed three characteristic regimes for therapeutic release from affinity-based systems. These regimes can be controlled by diffusion or unbinding kinetics, and can exhibit release over either a single stage or two stages. This analysis fundamentally changes the way we think of controlling release from affinity-based systems and thereby explains some of the discrepancies in the literature on which parameters influence affinity-based release. The rate of protein release from affinity-based systems is determined by the balance of diffusion of the therapeutic agent through the hydrogel and the dissociation kinetics of the affinity pair. Equations for tuning protein release rate by altering the strength (KD) of the affinity interaction, the concentration of binding ligand in the system, the rate of dissociation (koff) of the complex, and the hydrogel size and geometry, are provided. We validated our model by collapsing the model simulations and the experimental data from a recently described affinity release system, to a single master curve. Importantly, this mathematical analysis can be applied to any single species affinity-based system to determine the parameters required for a desired release profile. PMID:25449806

  7. Accurate, conformation-dependent predictions of solvent effects on protein ionization constants

    PubMed Central

    Barth, P.; Alber, T.; Harbury, P. B.

    2007-01-01

    Predicting how aqueous solvent modulates the conformational transitions and influences the pKa values that regulate the biological functions of biomolecules remains an unsolved challenge. To address this problem, we developed FDPB_MF, a rotamer repacking method that exhaustively samples side chain conformational space and rigorously calculates multibody protein–solvent interactions. FDPB_MF predicts the effects on pKa values of various solvent exposures, large ionic strength variations, strong energetic couplings, structural reorganizations and sequence mutations. The method achieves high accuracy, with root mean square deviations within 0.3 pH unit of the experimental values measured for turkey ovomucoid third domain, hen lysozyme, Bacillus circulans xylanase, and human and Escherichia coli thioredoxins. FDPB_MF provides a faithful, quantitative assessment of electrostatic interactions in biological macromolecules. PMID:17360348

  8. Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

    PubMed Central

    Pollastri, Gianluca; Martin, Alberto JM; Mooney, Catherine; Vullo, Alessandro

    2007-01-01

    Background Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. Results Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. Conclusion The predictive system are publicly available at the address . PMID:17570843

  9. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

  10. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events.

    PubMed

    Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B; Ben-Hur, Asa; Boucher, Christina

    2016-01-01

    Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The [Formula: see text] class contains tandem [Formula: see text]-type motif sequences, and the [Formula: see text] class contains alternating [Formula: see text], [Formula: see text] and [Formula: see text] type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a [Formula: see text]-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the [Formula: see text] class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for [Formula: see text]-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805

  11. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events

    PubMed Central

    Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina

    2016-01-01

    Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805

  12. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

    PubMed

    Li, Xiaowei; Liu, Taigang; Tao, Peiying; Wang, Chunhua; Chen, Lanming

    2015-12-01

    Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets. PMID:26460680

  13. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition.

    PubMed

    Kong, Liang; Zhang, Lichao; Lv, Jinfeng

    2014-03-01

    Extracting good representation from protein sequence is fundamental for protein structural classes prediction tasks. In this paper, we propose a novel and powerful method to predict protein structural classes based on the predicted secondary structure information. At the feature extraction stage, a 13-dimensional feature vector is extracted to characterize general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Specially, four segment-level features are designed to elevate discriminative ability for proteins from the α/β and α+β classes. After the features are extracted, a multi-class non-linear support vector machine classifier is used to implement protein structural classes prediction. We report extensive experiments comparing the proposed method to the state-of-the-art in protein structural classes prediction on three widely used low-similarity benchmark datasets: FC699, 1189 and 640. Our method achieves competitive performance on prediction accuracies, especially for the overall prediction accuracies which have exceeded the best reported results on all of the three datasets. PMID:24316044

  14. Microdosing of a Carbon-14 Labeled Protein in Healthy Volunteers Accurately Predicts Its Pharmacokinetics at Therapeutic Dosages.

    PubMed

    Vlaming, M L H; van Duijn, E; Dillingh, M R; Brands, R; Windhorst, A D; Hendrikse, N H; Bosgra, S; Burggraaf, J; de Koning, M C; Fidder, A; Mocking, J A J; Sandman, H; de Ligt, R A F; Fabriek, B O; Pasman, W J; Seinen, W; Alves, T; Carrondo, M; Peixoto, C; Peeters, P A M; Vaes, W H J

    2015-08-01

    Preclinical development of new biological entities (NBEs), such as human protein therapeutics, requires considerable expenditure of time and costs. Poor prediction of pharmacokinetics in humans further reduces net efficiency. In this study, we show for the first time that pharmacokinetic data of NBEs in humans can be successfully obtained early in the drug development process by the use of microdosing in a small group of healthy subjects combined with ultrasensitive accelerator mass spectrometry (AMS). After only minimal preclinical testing, we performed a first-in-human phase 0/phase 1 trial with a human recombinant therapeutic protein (RESCuing Alkaline Phosphatase, human recombinant placental alkaline phosphatase [hRESCAP]) to assess its safety and kinetics. Pharmacokinetic analysis showed dose linearity from microdose (53 μg) [(14) C]-hRESCAP to therapeutic doses (up to 5.3 mg) of the protein in healthy volunteers. This study demonstrates the value of a microdosing approach in a very small cohort for accelerating the clinical development of NBEs. PMID:25869840

  15. BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes

    PubMed Central

    2015-01-01

    Background Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. Results We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. Conclusions Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher

  16. Predict amine solution properties accurately

    SciTech Connect

    Cheng, S.; Meisen, A.; Chakma, A.

    1996-02-01

    Improved process design begins with using accurate physical property data. Especially in the preliminary design stage, physical property data such as density viscosity, thermal conductivity and specific heat can affect the overall performance of absorbers, heat exchangers, reboilers and pump. These properties can also influence temperature profiles in heat transfer equipment and thus control or affect the rate of amine breakdown. Aqueous-amine solution physical property data are available in graphical form. However, it is not convenient to use with computer-based calculations. Developed equations allow improved correlations of derived physical property estimates with published data. Expressions are given which can be used to estimate physical properties of methyldiethanolamine (MDEA), monoethanolamine (MEA) and diglycolamine (DGA) solutions.

  17. Membrane Protein Prediction Methods

    PubMed Central

    Punta, Marco; Forrest, Lucy R.; Bigelow, Henry; Kernytsky, Andrew; Liu, Jinfeng; Rost, Burkhard

    2007-01-01

    We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and highlight the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identification, transmembrane segment and protein topology predictions, homology and ab initio modeling. Overall, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experimental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and effort will be required in the future to ameliorate database search, homology and ab initio modeling. PMID:17367718

  18. New model accurately predicts reformate composition

    SciTech Connect

    Ancheyta-Juarez, J.; Aguilar-Rodriguez, E. )

    1994-01-31

    Although naphtha reforming is a well-known process, the evolution of catalyst formulation, as well as new trends in gasoline specifications, have led to rapid evolution of the process, including: reactor design, regeneration mode, and operating conditions. Mathematical modeling of the reforming process is an increasingly important tool. It is fundamental to the proper design of new reactors and revamp of existing ones. Modeling can be used to optimize operating conditions, analyze the effects of process variables, and enhance unit performance. Instituto Mexicano del Petroleo has developed a model of the catalytic reforming process that accurately predicts reformate composition at the higher-severity conditions at which new reformers are being designed. The new AA model is more accurate than previous proposals because it takes into account the effects of temperature and pressure on the rate constants of each chemical reaction.

  19. Quantitative Assessment of Protein Structural Models by Comparison of H/D Exchange MS Data with Exchange Behavior Accurately Predicted by DXCOREX

    NASA Astrophysics Data System (ADS)

    Liu, Tong; Pantazatos, Dennis; Li, Sheng; Hamuro, Yoshitomo; Hilser, Vincent J.; Woods, Virgil L.

    2012-01-01

    Peptide amide hydrogen/deuterium exchange mass spectrometry (DXMS) data are often used to qualitatively support models for protein structure. We have developed and validated a method (DXCOREX) by which exchange data can be used to quantitatively assess the accuracy of three-dimensional (3-D) models of protein structure. The method utilizes the COREX algorithm to predict a protein's amide hydrogen exchange rates by reference to a hypothesized structure, and these values are used to generate a virtual data set (deuteron incorporation per peptide) that can be quantitatively compared with the deuteration level of the peptide probes measured by hydrogen exchange experimentation. The accuracy of DXCOREX was established in studies performed with 13 proteins for which both high-resolution structures and experimental data were available. The DXCOREX-calculated and experimental data for each protein was highly correlated. We then employed correlation analysis of DXCOREX-calculated versus DXMS experimental data to assess the accuracy of a recently proposed structural model for the catalytic domain of a Ca2+-independent phospholipase A2. The model's calculated exchange behavior was highly correlated with the experimental exchange results available for the protein, supporting the accuracy of the proposed model. This method of analysis will substantially increase the precision with which experimental hydrogen exchange data can help decipher challenging questions regarding protein structure and dynamics.

  20. Predicting accurate probabilities with a ranking loss

    PubMed Central

    Menon, Aditya Krishna; Jiang, Xiaoqian J; Vembu, Shankar; Elkan, Charles; Ohno-Machado, Lucila

    2013-01-01

    In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking and regression performance, and models a richer set of probability distributions than statistical workhorses such as logistic regression. We provide experimental results that show the effectiveness of this technique on real-world applications of probability prediction. PMID:25285328

  1. Accurate prediction of the binding free energy and analysis of the mechanism of the interaction of replication protein A (RPA) with ssDNA.

    PubMed

    Carra, Claudio; Cucinotta, Francis A

    2012-06-01

    The eukaryotic replication protein A (RPA) has several pivotal functions in the cell metabolism, such as chromosomal replication, prevention of hairpin formation, DNA repair and recombination, and signaling after DNA damage. Moreover, RPA seems to have a crucial role in organizing the sequential assembly of DNA processing proteins along single stranded DNA (ssDNA). The strong RPA affinity for ssDNA, K(A) between 10(-9)-10(-10) M, is characterized by a low cooperativity with minor variation for changes on the nucleotide sequence. Recently, new data on RPA interactions was reported, including the binding free energy of the complex RPA70AB with dC(8) and dC(5), which has been estimated to be -10 ± 0.4 kcal mol(-1) and -7 ± 1 kcal mol(-1), respectively. In view of these results we performed a study based on molecular dynamics aimed to reproduce the absolute binding free energy of RPA70AB with the dC(5) and dC(8) oligonucleotides. We used several tools to analyze the binding free energy, rigidity, and time evolution of the complex. The results obtained by MM-PBSA method, with the use of ligand free geometry as a reference for the receptor in the separate trajectory approach, are in excellent agreement with the experimental data, with ±4 kcal mol(-1) error. This result shows that the MM-PB(GB)SA methods can provide accurate quantitative estimates of the binding free energy for interacting complexes when appropriate geometries are used for the receptor, ligand and complex. The decomposition of the MM-GBSA energy for each residue in the receptor allowed us to correlate the change of the affinity of the mutated protein with the ΔG(gas+sol) contribution of the residue considered in the mutation. The agreement with experiment is optimal and a strong change in the binding free energy can be considered as the dominant factor in the loss for the binding affinity resulting from mutation. PMID:22116609

  2. You Can Accurately Predict Land Acquisition Costs.

    ERIC Educational Resources Information Center

    Garrigan, Richard

    1967-01-01

    Land acquisition costs were tested for predictability based upon the 1962 assessed valuations of privately held land acquired for campus expansion by the University of Wisconsin from 1963-1965. By correlating the land acquisition costs of 108 properties acquired during the 3 year period with--(1) the assessed value of the land, (2) the assessed…

  3. Mouse models of human AML accurately predict chemotherapy response

    PubMed Central

    Zuber, Johannes; Radtke, Ina; Pardee, Timothy S.; Zhao, Zhen; Rappaport, Amy R.; Luo, Weijun; McCurrach, Mila E.; Yang, Miao-Miao; Dolan, M. Eileen; Kogan, Scott C.; Downing, James R.; Lowe, Scott W.

    2009-01-01

    The genetic heterogeneity of cancer influences the trajectory of tumor progression and may underlie clinical variation in therapy response. To model such heterogeneity, we produced genetically and pathologically accurate mouse models of common forms of human acute myeloid leukemia (AML) and developed methods to mimic standard induction chemotherapy and efficiently monitor therapy response. We see that murine AMLs harboring two common human AML genotypes show remarkably diverse responses to conventional therapy that mirror clinical experience. Specifically, murine leukemias expressing the AML1/ETO fusion oncoprotein, associated with a favorable prognosis in patients, show a dramatic response to induction chemotherapy owing to robust activation of the p53 tumor suppressor network. Conversely, murine leukemias expressing MLL fusion proteins, associated with a dismal prognosis in patients, are drug-resistant due to an attenuated p53 response. Our studies highlight the importance of genetic information in guiding the treatment of human AML, functionally establish the p53 network as a central determinant of chemotherapy response in AML, and demonstrate that genetically engineered mouse models of human cancer can accurately predict therapy response in patients. PMID:19339691

  4. Mouse models of human AML accurately predict chemotherapy response.

    PubMed

    Zuber, Johannes; Radtke, Ina; Pardee, Timothy S; Zhao, Zhen; Rappaport, Amy R; Luo, Weijun; McCurrach, Mila E; Yang, Miao-Miao; Dolan, M Eileen; Kogan, Scott C; Downing, James R; Lowe, Scott W

    2009-04-01

    The genetic heterogeneity of cancer influences the trajectory of tumor progression and may underlie clinical variation in therapy response. To model such heterogeneity, we produced genetically and pathologically accurate mouse models of common forms of human acute myeloid leukemia (AML) and developed methods to mimic standard induction chemotherapy and efficiently monitor therapy response. We see that murine AMLs harboring two common human AML genotypes show remarkably diverse responses to conventional therapy that mirror clinical experience. Specifically, murine leukemias expressing the AML1/ETO fusion oncoprotein, associated with a favorable prognosis in patients, show a dramatic response to induction chemotherapy owing to robust activation of the p53 tumor suppressor network. Conversely, murine leukemias expressing MLL fusion proteins, associated with a dismal prognosis in patients, are drug-resistant due to an attenuated p53 response. Our studies highlight the importance of genetic information in guiding the treatment of human AML, functionally establish the p53 network as a central determinant of chemotherapy response in AML, and demonstrate that genetically engineered mouse models of human cancer can accurately predict therapy response in patients. PMID:19339691

  5. A fast and accurate computational approach to protein ionization

    PubMed Central

    Spassov, Velin Z.; Yan, Lisa

    2008-01-01

    We report a very fast and accurate physics-based method to calculate pH-dependent electrostatic effects in protein molecules and to predict the pK values of individual sites of titration. In addition, a CHARMm-based algorithm is included to construct and refine the spatial coordinates of all hydrogen atoms at a given pH. The present method combines electrostatic energy calculations based on the Generalized Born approximation with an iterative mobile clustering approach to calculate the equilibria of proton binding to multiple titration sites in protein molecules. The use of the GBIM (Generalized Born with Implicit Membrane) CHARMm module makes it possible to model not only water-soluble proteins but membrane proteins as well. The method includes a novel algorithm for preliminary refinement of hydrogen coordinates. Another difference from existing approaches is that, instead of monopeptides, a set of relaxed pentapeptide structures are used as model compounds. Tests on a set of 24 proteins demonstrate the high accuracy of the method. On average, the RMSD between predicted and experimental pK values is close to 0.5 pK units on this data set, and the accuracy is achieved at very low computational cost. The pH-dependent assignment of hydrogen atoms also shows very good agreement with protonation states and hydrogen-bond network observed in neutron-diffraction structures. The method is implemented as a computational protocol in Accelrys Discovery Studio and provides a fast and easy way to study the effect of pH on many important mechanisms such as enzyme catalysis, ligand binding, protein–protein interactions, and protein stability. PMID:18714088

  6. Practical lessons from protein structure prediction

    PubMed Central

    Ginalski, Krzysztof; Grishin, Nick V.; Godzik, Adam; Rychlewski, Leszek

    2005-01-01

    Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed. PMID:15805122

  7. PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

    PubMed Central

    Bendl, Jaroslav; Stourac, Jan; Salanda, Ondrej; Pavelka, Antonin; Wieben, Eric D.; Zendulka, Jaroslav; Brezovsky, Jan; Damborsky, Jiri

    2014-01-01

    Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp. PMID:24453961

  8. Inverter Modeling For Accurate Energy Predictions Of Tracking HCPV Installations

    NASA Astrophysics Data System (ADS)

    Bowman, J.; Jensen, S.; McDonald, Mark

    2010-10-01

    High efficiency high concentration photovoltaic (HCPV) solar plants of megawatt scale are now operational, and opportunities for expanded adoption are plentiful. However, effective bidding for sites requires reliable prediction of energy production. HCPV module nameplate power is rated for specific test conditions; however, instantaneous HCPV power varies due to site specific irradiance and operating temperature, and is degraded by soiling, protective stowing, shading, and electrical connectivity. These factors interact with the selection of equipment typically supplied by third parties, e.g., wire gauge and inverters. We describe a time sequence model accurately accounting for these effects that predicts annual energy production, with specific reference to the impact of the inverter on energy output and interactions between system-level design decisions and the inverter. We will also show two examples, based on an actual field design, of inverter efficiency calculations and the interaction between string arrangements and inverter selection.

  9. Computational approaches for predicting mutant protein stability.

    PubMed

    Kulshreshtha, Shweta; Chaudhary, Vigi; Goswami, Girish K; Mathur, Nidhi

    2016-05-01

    Mutations in the protein affect not only the structure of protein, but also its function and stability. Prediction of mutant protein stability with accuracy is desired for uncovering the molecular aspects of diseases and design of novel proteins. Many advanced computational approaches have been developed over the years, to predict the stability and function of a mutated protein. These approaches based on structure, sequence features and combined features (both structure and sequence features) provide reasonably accurate estimation of the impact of amino acid substitution on stability and function of protein. Recently, consensus tools have been developed by incorporating many tools together, which provide single window results for comparison purpose. In this review, a useful guide for the selection of tools that can be employed in predicting mutated proteins' stability and disease causing capability is provided. PMID:27160393

  10. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    DOE PAGESBeta

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; et al

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less

  11. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    SciTech Connect

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; Rose, Kristie L.; Tabb, David L.

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.

  12. Passive samplers accurately predict PAH levels in resident crayfish.

    PubMed

    Paulik, L Blair; Smith, Brian W; Bergmann, Alan J; Sower, Greg J; Forsberg, Norman D; Teeguarden, Justin G; Anderson, Kim A

    2016-02-15

    Contamination of resident aquatic organisms is a major concern for environmental risk assessors. However, collecting organisms to estimate risk is often prohibitively time and resource-intensive. Passive sampling accurately estimates resident organism contamination, and it saves time and resources. This study used low density polyethylene (LDPE) passive water samplers to predict polycyclic aromatic hydrocarbon (PAH) levels in signal crayfish, Pacifastacus leniusculus. Resident crayfish were collected at 5 sites within and outside of the Portland Harbor Superfund Megasite (PHSM) in the Willamette River in Portland, Oregon. LDPE deployment was spatially and temporally paired with crayfish collection. Crayfish visceral and tail tissue, as well as water-deployed LDPE, were extracted and analyzed for 62 PAHs using GC-MS/MS. Freely-dissolved concentrations (Cfree) of PAHs in water were calculated from concentrations in LDPE. Carcinogenic risks were estimated for all crayfish tissues, using benzo[a]pyrene equivalent concentrations (BaPeq). ∑PAH were 5-20 times higher in viscera than in tails, and ∑BaPeq were 6-70 times higher in viscera than in tails. Eating only tail tissue of crayfish would therefore significantly reduce carcinogenic risk compared to also eating viscera. Additionally, PAH levels in crayfish were compared to levels in crayfish collected 10 years earlier. PAH levels in crayfish were higher upriver of the PHSM and unchanged within the PHSM after the 10-year period. Finally, a linear regression model predicted levels of 34 PAHs in crayfish viscera with an associated R-squared value of 0.52 (and a correlation coefficient of 0.72), using only the Cfree PAHs in water. On average, the model predicted PAH concentrations in crayfish tissue within a factor of 2.4 ± 1.8 of measured concentrations. This affirms that passive water sampling accurately estimates PAH contamination in crayfish. Furthermore, the strong predictive ability of this simple model suggests

  13. PROMALS web server for accurate multiple protein sequence alignments.

    PubMed

    Pei, Jimin; Kim, Bong-Hyun; Tang, Ming; Grishin, Nick V

    2007-07-01

    Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/ PMID:17452345

  14. A new generalized correlation for accurate vapor pressure prediction

    NASA Astrophysics Data System (ADS)

    An, Hui; Yang, Wenming

    2012-08-01

    An accurate knowledge of the vapor pressure of organic liquids is very important for the oil and gas processing operations. In combustion modeling, the accuracy of numerical predictions is also highly dependent on the fuel properties such as vapor pressure. In this Letter, a new generalized correlation is proposed based on the Lee-Kesler's method where a fuel dependent parameter 'A' is introduced. The proposed method only requires the input parameters of critical temperature, normal boiling temperature and the acentric factor of the fluid. With this method, vapor pressures have been calculated and compared with the data reported in data compilation for 42 organic liquids over 1366 data points, and the overall average absolute percentage deviation is only 1.95%.

  15. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average. PMID:25343279

  16. Turbulence Models for Accurate Aerothermal Prediction in Hypersonic Flows

    NASA Astrophysics Data System (ADS)

    Zhang, Xiang-Hong; Wu, Yi-Zao; Wang, Jiang-Feng

    Accurate description of the aerodynamic and aerothermal environment is crucial to the integrated design and optimization for high performance hypersonic vehicles. In the simulation of aerothermal environment, the effect of viscosity is crucial. The turbulence modeling remains a major source of uncertainty in the computational prediction of aerodynamic forces and heating. In this paper, three turbulent models were studied: the one-equation eddy viscosity transport model of Spalart-Allmaras, the Wilcox k-ω model and the Menter SST model. For the k-ω model and SST model, the compressibility correction, press dilatation and low Reynolds number correction were considered. The influence of these corrections for flow properties were discussed by comparing with the results without corrections. In this paper the emphasis is on the assessment and evaluation of the turbulence models in prediction of heat transfer as applied to a range of hypersonic flows with comparison to experimental data. This will enable establishing factor of safety for the design of thermal protection systems of hypersonic vehicle.

  17. Predicting protein dynamics from structural ensembles

    NASA Astrophysics Data System (ADS)

    Copperman, J.; Guenza, M. G.

    2015-12-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

  18. Accurate mass spectrometry based protein quantification via shared peptides.

    PubMed

    Dost, Banu; Bandeira, Nuno; Li, Xiangqian; Shen, Zhouxin; Briggs, Steven P; Bafna, Vineet

    2012-04-01

    In mass spectrometry-based protein quantification, peptides that are shared across different protein sequences are often discarded as being uninformative with respect to each of the parent proteins. We investigate the use of shared peptides which are ubiquitous (~50% of peptides) in mass spectrometric data-sets for accurate protein identification and quantification. Different from existing approaches, we show how shared peptides can help compute the relative amounts of the proteins that contain them. Also, proteins with no unique peptide in the sample can still be analyzed for relative abundance. Our article uses shared peptides in protein quantification and makes use of combinatorial optimization to reduce the error in relative abundance measurements. We describe the topological and numerical properties required for robust estimates, and use them to improve our estimates for ill-conditioned systems. Extensive simulations validate our approach even in the presence of experimental error. We apply our method to a model of Arabidopsis thaliana root knot nematode infection, and investigate the differential role of several protein family members in mediating host response to the pathogen. PMID:22414154

  19. Accurate Prediction of Binding Thermodynamics for DNA on Surfaces

    PubMed Central

    Vainrub, Arnold; Pettitt, B. Montgomery

    2011-01-01

    For DNA mounted on surfaces for microarrays, microbeads and nanoparticles, the nature of the random attachment of oligonucleotide probes to an amorphous surface gives rise to a locally inhomogeneous probe density. These fluctuations of the probe surface density are inherent to all common surface or bead platforms, regardless if they exploit either an attachment of pre-synthesized probes or probes synthesized in situ on the surface. Here, we demonstrate for the first time the crucial role of the probe surface density fluctuations in performance of DNA arrays. We account for the density fluctuations with a disordered two-dimensional surface model and derive the corresponding array hybridization isotherm that includes a counter-ion screened electrostatic repulsion between the assayed DNA and probe array. The calculated melting curves are in excellent agreement with published experimental results for arrays with both pre-synthesized and in-situ synthesized oligonucleotide probes. The approach developed allows one to accurately predict the melting curves of DNA arrays using only the known sequence dependent hybridization enthalpy and entropy in solution and the experimental macroscopic surface density of probes. This opens the way to high precision theoretical design and optimization of probes and primers in widely used DNA array-based high-throughput technologies for gene expression, genotyping, next-generation sequencing, and surface polymerase extension. PMID:21972932

  20. Accurate indel prediction using paired-end short reads

    PubMed Central

    2013-01-01

    Background One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. Results Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. Conclusion In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/. PMID:23442375

  1. Fast and accurate predictions of covalent bonds in chemical space.

    PubMed

    Chang, K Y Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (∼1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H2 (+). Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  2. Fast and accurate predictions of covalent bonds in chemical space

    NASA Astrophysics Data System (ADS)

    Chang, K. Y. Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O. Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (˜1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H 2+ . Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  3. Accurate Determination of Conformational Transitions in Oligomeric Membrane Proteins

    PubMed Central

    Sanz-Hernández, Máximo; Vostrikov, Vitaly V.; Veglia, Gianluigi; De Simone, Alfonso

    2016-01-01

    The structural dynamics governing collective motions in oligomeric membrane proteins play key roles in vital biomolecular processes at cellular membranes. In this study, we present a structural refinement approach that combines solid-state NMR experiments and molecular simulations to accurately describe concerted conformational transitions identifying the overall structural, dynamical, and topological states of oligomeric membrane proteins. The accuracy of the structural ensembles generated with this method is shown to reach the statistical error limit, and is further demonstrated by correctly reproducing orthogonal NMR data. We demonstrate the accuracy of this approach by characterising the pentameric state of phospholamban, a key player in the regulation of calcium uptake in the sarcoplasmic reticulum, and by probing its dynamical activation upon phosphorylation. Our results underline the importance of using an ensemble approach to characterise the conformational transitions that are often responsible for the biological function of oligomeric membrane protein states. PMID:26975211

  4. The SILAC Fly Allows for Accurate Protein Quantification in Vivo*

    PubMed Central

    Sury, Matthias D.; Chen, Jia-Xuan; Selbach, Matthias

    2010-01-01

    Stable isotope labeling by amino acids in cell culture (SILAC) is widely used to quantify protein abundance in tissue culture cells. Until now, the only multicellular organism completely labeled at the amino acid level was the laboratory mouse. The fruit fly Drosophila melanogaster is one of the most widely used small animal models in biology. Here, we show that feeding flies with SILAC-labeled yeast leads to almost complete labeling in the first filial generation. We used these “SILAC flies” to investigate sexual dimorphism of protein abundance in D. melanogaster. Quantitative proteome comparison of adult male and female flies revealed distinct biological processes specific for each sex. Using a tudor mutant that is defective for germ cell generation allowed us to differentiate between sex-specific protein expression in the germ line and somatic tissue. We identified many proteins with known sex-specific expression bias. In addition, several new proteins with a potential role in sexual dimorphism were identified. Collectively, our data show that the SILAC fly can be used to accurately quantify protein abundance in vivo. The approach is simple, fast, and cost-effective, making SILAC flies an attractive model system for the emerging field of in vivo quantitative proteomics. PMID:20525996

  5. IRIS: Towards an Accurate and Fast Stage Weight Prediction Method

    NASA Astrophysics Data System (ADS)

    Taponier, V.; Balu, A.

    2002-01-01

    The knowledge of the structural mass fraction (or the mass ratio) of a given stage, which affects the performance of a rocket, is essential for the analysis of new or upgraded launchers or stages, whose need is increased by the quick evolution of the space programs and by the necessity of their adaptation to the market needs. The availability of this highly scattered variable, ranging between 0.05 and 0.15, is of primary importance at the early steps of the preliminary design studies. At the start of the staging and performance studies, the lack of frozen weight data (to be obtained later on from propulsion, trajectory and sizing studies) leads to rely on rough estimates, generally derived from printed sources and adapted. When needed, a consolidation can be acquired trough a specific analysis activity involving several techniques and implying additional effort and time. The present empirical approach allows thus to get approximated values (i.e. not necessarily accurate or consistent), inducing some result inaccuracy as well as, consequently, difficulties of performance ranking for a multiple option analysis, and an increase of the processing duration. This forms a classical harsh fact of the preliminary design system studies, insufficiently discussed to date. It appears therefore highly desirable to have, for all the evaluation activities, a reliable, fast and easy-to-use weight or mass fraction prediction method. Additionally, the latter should allow for a pre selection of the alternative preliminary configurations, making possible a global system approach. For that purpose, an attempt at modeling has been undertaken, whose objective was the determination of a parametric formulation of the mass fraction, to be expressed from a limited number of parameters available at the early steps of the project. It is based on the innovative use of a statistical method applicable to a variable as a function of several independent parameters. A specific polynomial generator

  6. The PredictProtein server

    PubMed Central

    Rost, Burkhard; Yachdav, Guy; Liu, Jinfeng

    2004-01-01

    PredictProtein (http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail or interactively via the World Wide Web. PMID:15215403

  7. BPROMPT: A consensus server for membrane protein prediction.

    PubMed

    Taylor, Paul D; Attwood, Teresa K; Flower, Darren R

    2003-07-01

    Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT. PMID:12824397

  8. conSSert: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure.

    PubMed

    Kieslich, Chris A; Smadbeck, James; Khoury, George A; Floudas, Christodoulos A

    2016-03-28

    Accurate prediction of protein secondary structure remains a crucial step in most approaches to the protein-folding problem, yet the prediction of ordered secondary structure, specifically beta-strands, remains a challenge. We developed a consensus secondary structure prediction method, conSSert, which is based on support vector machines (SVM) and provides exceptional accuracy for the prediction of beta-strands with QE accuracy of over 0.82 and a Q2-EH of 0.86. conSSert uses as input probabilities for the three types of secondary structure (helix, strand, and coil) that are predicted by four top performing methods: PSSpred, PSIPRED, SPINE-X, and RAPTOR. conSSert was trained/tested using 4261 protein chains from PDBSelect25, and 8632 chains from PISCES. Further validation was performed using targets from CASP9, CASP10, and CASP11. Our data suggest that poor performance in strand prediction is likely a result of training bias and not solely due to the nonlocal nature of beta-sheet contacts. conSSert is freely available for noncommercial use as a webservice: http://ares.tamu.edu/conSSert/ . PMID:26928531

  9. Toolbox for Protein Structure Prediction.

    PubMed

    Roche, Daniel Barry; McGuffin, Liam James

    2016-01-01

    Protein tertiary structure prediction algorithms aim to predict, from amino acid sequence, the tertiary structure of a protein. In silico protein structure prediction methods have become extremely important, as in vitro-based structural elucidation is unable to keep pace with the current growth of sequence databases due to high-throughput next-generation sequencing, which has exacerbated the gaps in our knowledge between sequences and structures.Here we briefly discuss protein tertiary structure prediction, the biennial competition for the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and its role in shaping the field. We also discuss, in detail, our cutting-edge web-server method IntFOLD2-TS for tertiary structure prediction. Furthermore, we provide a step-by-step guide on using the IntFOLD2-TS web server, along with some real world examples, where the IntFOLD server can and has been used to improve protein tertiary structure prediction and aid in functional elucidation. PMID:26519323

  10. Accurate Structure Prediction and Conformational Analysis of Cyclic Peptides with Residue-Specific Force Fields.

    PubMed

    Geng, Hao; Jiang, Fan; Wu, Yun-Dong

    2016-05-19

    Cyclic peptides (CPs) are promising candidates for drugs, chemical biology tools, and self-assembling nanomaterials. However, the development of reliable and accurate computational methods for their structure prediction has been challenging. Here, 20 all-trans CPs of 5-12 residues selected from Cambridge Structure Database have been simulated using replica-exchange molecular dynamics with four different force fields. Our recently developed residue-specific force fields RSFF1 and RSFF2 can correctly identify the crystal-like conformations of more than half CPs as the most populated conformation. The RSFF2 performs the best, which consistently predicts the crystal structures of 17 out of 20 CPs with rmsd < 1.1 Å. We also compared the backbone (ϕ, ψ) sampling of residues in CPs with those in short linear peptides and in globular proteins. In general, unlike linear peptides, CPs have local conformational free energies and entropies quite similar to globular proteins. PMID:27128113

  11. PREFACE: Protein protein interactions: principles and predictions

    NASA Astrophysics Data System (ADS)

    Nussinov, Ruth; Tsai, Chung-Jung

    2005-06-01

    Proteins are the `workhorses' of the cell. Their roles span functions as diverse as being molecular machines and signalling. They carry out catalytic reactions, transport, form viral capsids, traverse membranes and form regulated channels, transmit information from DNA to RNA, making possible the synthesis of new proteins, and they are responsible for the degradation of unnecessary proteins and nucleic acids. They are the vehicles of the immune response and are responsible for viral entry into the cell. Given their importance, considerable effort has been centered on the prediction of protein function. A prime way to do this is through identification of binding partners. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) in which it plays a role. This holds since the vast majority of their chores in the living cell involve protein-protein interactions. Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation. Their identification is at the heart of functional genomics; their prediction is crucial for drug discovery. Knowledge of the pathway, its topology, length, and dynamics may provide useful information for forecasting side effects. The goal of predicting protein-protein interactions is daunting. Some associations are obligatory, others are continuously forming and dissociating. In principle, from the physical standpoint, any two proteins can interact, but under what conditions and at which strength? The principles of protein-protein interactions are general: the non-covalent interactions of two proteins are largely the outcome of the hydrophobic effect, which drives the interactions. In addition, hydrogen bonds and electrostatic interactions play important roles. Thus, many of the interactions observed in vitro are the outcome of experimental overexpression. Protein disorder

  12. Accurately Predicting Complex Reaction Kinetics from First Principles

    NASA Astrophysics Data System (ADS)

    Green, William

    Many important systems contain a multitude of reactive chemical species, some of which react on a timescale faster than collisional thermalization, i.e. they never achieve a Boltzmann energy distribution. Usually it is impossible to fully elucidate the processes by experiments alone. Here we report recent progress toward predicting the time-evolving composition of these systems a priori: how unexpected reactions can be discovered on the computer, how reaction rates are computed from first principles, and how the many individual reactions are efficiently combined into a predictive simulation for the whole system. Some experimental tests of the a priori predictions are also presented.

  13. The PredictProtein server

    PubMed Central

    Rost, Burkhard; Liu, Jinfeng

    2003-01-01

    PredictProtein (PP, http://cubic.bioc.columbia.edu/pp/) is an internet service for sequence analysis and the prediction of aspects of protein structure and function. Users submit protein sequence or alignments; the server returns a multiple sequence alignment, PROSITE sequence motifs, low-complexity regions (SEG), ProDom domain assignments, nuclear localisation signals, regions lacking regular structure and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions and disulfide-bonds. Upon request, fold recognition by prediction-based threading is available. For all services, users can submit their query either by electronic mail or interactively from World Wide Web. PMID:12824312

  14. Is Three-Dimensional Soft Tissue Prediction by Software Accurate?

    PubMed

    Nam, Ki-Uk; Hong, Jongrak

    2015-11-01

    The authors assessed whether virtual surgery, performed with a soft tissue prediction program, could correctly simulate the actual surgical outcome, focusing on soft tissue movement. Preoperative and postoperative computed tomography (CT) data for 29 patients, who had undergone orthognathic surgery, were obtained and analyzed using the Simplant Pro software. The program made a predicted soft tissue image (A) based on presurgical CT data. After the operation, we obtained actual postoperative CT data and an actual soft tissue image (B) was generated. Finally, the 2 images (A and B) were superimposed and analyzed differences between the A and B. Results were grouped in 2 classes: absolute values and vector values. In the absolute values, the left mouth corner was the most significant error point (2.36 mm). The right mouth corner (2.28 mm), labrale inferius (2.08 mm), and the pogonion (2.03 mm) also had significant errors. In vector values, prediction of the right-left side had a left-sided tendency, the superior-inferior had a superior tendency, and the anterior-posterior showed an anterior tendency. As a result, with this program, the position of points tended to be located more left, anterior, and superior than the "real" situation. There is a need to improve the prediction accuracy for soft tissue images. Such software is particularly valuable in predicting craniofacial soft tissues landmarks, such as the pronasale. With this software, landmark positions were most inaccurate in terms of anterior-posterior predictions. PMID:26594988

  15. Accurate perception of negative emotions predicts functional capacity in schizophrenia.

    PubMed

    Abram, Samantha V; Karpouzian, Tatiana M; Reilly, James L; Derntl, Birgit; Habel, Ute; Smith, Matthew J

    2014-04-30

    Several studies suggest facial affect perception (FAP) deficits in schizophrenia are linked to poorer social functioning. However, whether reduced functioning is associated with inaccurate perception of specific emotional valence or a global FAP impairment remains unclear. The present study examined whether impairment in the perception of specific emotional valences (positive, negative) and neutrality were uniquely associated with social functioning, using a multimodal social functioning battery. A sample of 59 individuals with schizophrenia and 41 controls completed a computerized FAP task, and measures of functional capacity, social competence, and social attainment. Participants also underwent neuropsychological testing and symptom assessment. Regression analyses revealed that only accurately perceiving negative emotions explained significant variance (7.9%) in functional capacity after accounting for neurocognitive function and symptoms. Partial correlations indicated that accurately perceiving anger, in particular, was positively correlated with functional capacity. FAP for positive, negative, or neutral emotions were not related to social competence or social attainment. Our findings were consistent with prior literature suggesting negative emotions are related to functional capacity in schizophrenia. Furthermore, the observed relationship between perceiving anger and performance of everyday living skills is novel and warrants further exploration. PMID:24524947

  16. Towards Accurate Ab Initio Predictions of the Spectrum of Methane

    NASA Technical Reports Server (NTRS)

    Schwenke, David W.; Kwak, Dochan (Technical Monitor)

    2001-01-01

    We have carried out extensive ab initio calculations of the electronic structure of methane, and these results are used to compute vibrational energy levels. We include basis set extrapolations, core-valence correlation, relativistic effects, and Born- Oppenheimer breakdown terms in our calculations. Our ab initio predictions of the lowest lying levels are superb.

  17. Standardized EEG interpretation accurately predicts prognosis after cardiac arrest

    PubMed Central

    Rossetti, Andrea O.; van Rootselaar, Anne-Fleur; Wesenberg Kjaer, Troels; Horn, Janneke; Ullén, Susann; Friberg, Hans; Nielsen, Niklas; Rosén, Ingmar; Åneman, Anders; Erlinge, David; Gasche, Yvan; Hassager, Christian; Hovdenes, Jan; Kjaergaard, Jesper; Kuiper, Michael; Pellis, Tommaso; Stammet, Pascal; Wanscher, Michael; Wetterslev, Jørn; Wise, Matt P.; Cronberg, Tobias

    2016-01-01

    Objective: To identify reliable predictors of outcome in comatose patients after cardiac arrest using a single routine EEG and standardized interpretation according to the terminology proposed by the American Clinical Neurophysiology Society. Methods: In this cohort study, 4 EEG specialists, blinded to outcome, evaluated prospectively recorded EEGs in the Target Temperature Management trial (TTM trial) that randomized patients to 33°C vs 36°C. Routine EEG was performed in patients still comatose after rewarming. EEGs were classified into highly malignant (suppression, suppression with periodic discharges, burst-suppression), malignant (periodic or rhythmic patterns, pathological or nonreactive background), and benign EEG (absence of malignant features). Poor outcome was defined as best Cerebral Performance Category score 3–5 until 180 days. Results: Eight TTM sites randomized 202 patients. EEGs were recorded in 103 patients at a median 77 hours after cardiac arrest; 37% had a highly malignant EEG and all had a poor outcome (specificity 100%, sensitivity 50%). Any malignant EEG feature had a low specificity to predict poor prognosis (48%) but if 2 malignant EEG features were present specificity increased to 96% (p < 0.001). Specificity and sensitivity were not significantly affected by targeted temperature or sedation. A benign EEG was found in 1% of the patients with a poor outcome. Conclusions: Highly malignant EEG after rewarming reliably predicted poor outcome in half of patients without false predictions. An isolated finding of a single malignant feature did not predict poor outcome whereas a benign EEG was highly predictive of a good outcome. PMID:26865516

  18. How Accurately Can We Predict Eclipses for Algol? (Poster abstract)

    NASA Astrophysics Data System (ADS)

    Turner, D.

    2016-06-01

    (Abstract only) beta Persei, or Algol, is a very well known eclipsing binary system consisting of a late B-type dwarf that is regularly eclipsed by a GK subgiant every 2.867 days. Eclipses, which last about 8 hours, are regular enough that predictions for times of minima are published in various places, Sky & Telescope magazine and The Observer's Handbook, for example. But eclipse minimum lasts for less than a half hour, whereas subtle mistakes in the current ephemeris for the star can result in predictions that are off by a few hours or more. The Algol system is fairly complex, with the Algol A and Algol B eclipsing system also orbited by Algol C with an orbital period of nearly 2 years. Added to that are complex long-term O-C variations with a periodicity of almost two centuries that, although suggested by Hoffmeister to be spurious, fit the type of light travel time variations expected for a fourth star also belonging to the system. The AB sub-system also undergoes mass transfer events that add complexities to its O-C behavior. Is it actually possible to predict precise times of eclipse minima for Algol months in advance given such complications, or is it better to encourage ongoing observations of the star so that O-C variations can be tracked in real time?

  19. DomHR: Accurately Identifying Domain Boundaries in Proteins Using a Hinge Region Strategy

    PubMed Central

    Zhang, Xiao-yan; Lu, Long-jian; Song, Qi; Yang, Qian-qian; Li, Da-peng; Sun, Jiang-ming; Li, Tong-hua; Cong, Pei-sheng

    2013-01-01

    Motivation The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. Results In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. Availability The DomHR is available at http://cal.tongji.edu.cn/domain/. PMID:23593247

  20. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  1. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

    PubMed

    Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

    2016-03-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  2. Accurate predictions for the production of vaporized water

    SciTech Connect

    Morin, E.; Montel, F.

    1995-12-31

    The production of water vaporized in the gas phase is controlled by the local conditions around the wellbore. The pressure gradient applied to the formation creates a sharp increase of the molar water content in the hydrocarbon phase approaching the well; this leads to a drop in the pore water saturation around the wellbore. The extent of the dehydrated zone which is formed is the key controlling the bottom-hole content of vaporized water. The maximum water content in the hydrocarbon phase at a given pressure, temperature and salinity is corrected by capillarity or adsorption phenomena depending on the actual water saturation. Describing the mass transfer of the water between the hydrocarbon phases and the aqueous phase into the tubing gives a clear idea of vaporization effects on the formation of scales. Field example are presented for gas fields with temperatures ranging between 140{degrees}C and 180{degrees}C, where water vaporization effects are significant. Conditions for salt plugging in the tubing are predicted.

  3. Change in BMI Accurately Predicted by Social Exposure to Acquaintances

    PubMed Central

    Oloritun, Rahman O.; Ouarda, Taha B. M. J.; Moturu, Sai; Madan, Anmol; Pentland, Alex (Sandy); Khayal, Inas

    2013-01-01

    Research has mostly focused on obesity and not on processes of BMI change more generally, although these may be key factors that lead to obesity. Studies have suggested that obesity is affected by social ties. However these studies used survey based data collection techniques that may be biased toward select only close friends and relatives. In this study, mobile phone sensing techniques were used to routinely capture social interaction data in an undergraduate dorm. By automating the capture of social interaction data, the limitations of self-reported social exposure data are avoided. This study attempts to understand and develop a model that best describes the change in BMI using social interaction data. We evaluated a cohort of 42 college students in a co-located university dorm, automatically captured via mobile phones and survey based health-related information. We determined the most predictive variables for change in BMI using the least absolute shrinkage and selection operator (LASSO) method. The selected variables, with gender, healthy diet category, and ability to manage stress, were used to build multiple linear regression models that estimate the effect of exposure and individual factors on change in BMI. We identified the best model using Akaike Information Criterion (AIC) and R2. This study found a model that explains 68% (p<0.0001) of the variation in change in BMI. The model combined social interaction data, especially from acquaintances, and personal health-related information to explain change in BMI. This is the first study taking into account both interactions with different levels of social interaction and personal health-related information. Social interactions with acquaintances accounted for more than half the variation in change in BMI. This suggests the importance of not only individual health information but also the significance of social interactions with people we are exposed to, even people we may not consider as close friends. PMID

  4. Accurate refinement of docked protein complexes using evolutionary information and deep learning.

    PubMed

    Akbal-Delibas, Bahar; Farhoodi, Roshanak; Pomplun, Marc; Haspel, Nurit

    2016-06-01

    One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures. PMID:26846813

  5. A multi-objective optimization approach accurately resolves protein domain architectures

    PubMed Central

    Bernardes, J.S.; Vieira, F.R.J.; Zaverucha, G.; Carbone, A.

    2016-01-01

    Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26458889

  6. Structure based prediction of protein folding intermediates.

    PubMed

    Xie, D; Freire, E

    1994-09-01

    The complete unfolding of a protein involves the disruption of non-covalent intramolecular interactions within the protein and the subsequent hydration of the backbone and amino acid side-chains. The magnitude of the thermodynamic parameters associated with this process is known accurately for a growing number of globular proteins for which high-resolution structures are also available. The existence of this database of structural and thermodynamic information has facilitated the development of statistical procedures aimed at quantifying the relationships existing between protein structure and the thermodynamic parameters of folding/unfolding. Under some conditions proteins do not unfold completely, giving rise to states (commonly known as molten globules) in which the molecule retains some secondary structure and remains in a compact configuration after denaturation. This phenomenon is reflected in the thermodynamics of the process. Depending on the nature of the residual structure that exists after denaturation, the observed enthalpy, entropy and heat capacity changes will deviate in a particular and predictable way from the values expected for complete unfolding. For several proteins, these deviations have been shown to exhibit similar characteristics, suggesting that their equilibrium folding intermediates exhibit some common structural features. Employing empirically derived structure-energetic relationships, it is possible to identify in the native structure of the protein those regions with the higher probability of being structured in equilibrium partly folded states. In this work, a thermodynamic search algorithm aimed at identifying the structural determinants of the molten globule state has been applied to six globular proteins; alpha-lactalbumin, barnase, IIIGlc, interleukin-1 beta, phage T4 lysozyme and phage 434 repressor. Remarkably, the structural features of the predicted equilibrium intermediates coincide to a large extent with the known

  7. Scoring docking conformations using predicted protein interfaces

    PubMed Central

    2014-01-01

    Background Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations. PMID:24906633

  8. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  9. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be. PMID:24728994

  10. ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

    SciTech Connect

    Visel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.

    2009-02-01

    A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.

  11. Protein structural domains: definition and prediction.

    PubMed

    Ezkurdia, Iakes; Tress, Michael L

    2011-11-01

    Recognition and prediction of structural domains in proteins is an important part of structure and function prediction. This unit lists the range of tools available for domain prediction, and describes sequence and structural analysis tools that complement domain prediction methods. Also detailed are the basic domain prediction steps, along with suggested strategies for different protein sequences and potential pitfalls in domain boundary prediction. The difficult problem of domain orientation prediction is also discussed. All the resources necessary for domain boundary prediction are accessible via publicly available Web servers and databases and do not require computational expertise. PMID:22045561

  12. Structure Prediction of Protein Complexes

    NASA Astrophysics Data System (ADS)

    Pierce, Brian; Weng, Zhiping

    Protein-protein interactions are critical for biological function. They directly and indirectly influence the biological systems of which they are a part. Antibodies bind with antigens to detect and stop viruses and other infectious agents. Cell signaling is performed in many cases through the interactions between proteins. Many diseases involve protein-protein interactions on some level, including cancer and prion diseases.

  13. DSP: a protein shape string and its profile prediction server

    PubMed Central

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-01-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/. PMID:22553364

  14. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES

    PubMed Central

    Grant, Marianne A.

    2014-01-01

    Pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate innovative strategies for identifying valid targets and discovering leads against them as a way of accelerating drug discovery. The ever increasing number and diversity of novel protein sequences identified by genomic sequencing projects and the success of worldwide structural genomics initiatives have spurred great interest and impetus in the development of methods for accurate, computationally empowered protein function prediction and active site identification. Previously, in the absence of direct experimental evidence, homology-based protein function annotation remained the gold-standard for in silico analysis and prediction of protein function. However, with the continued exponential expansion of sequence databases, this approach is not always applicable, as fewer query protein sequences demonstrate significant homology to protein gene products of known function. As a result, several non-homology based methods for protein function prediction that are based on sequence features, structure, evolution, biochemical and genetic knowledge have emerged. Herein, we review current bioinformatic programs and approaches for protein function prediction/annotation and discuss their integration into drug discovery initiatives. The development of such methods to annotate protein functional sites and their application to large protein functional families is crucial to successfully utilizing the vast amounts of genomic sequence information available to drug discovery and development processes. PMID:25530654

  15. Protein Residue Contacts and Prediction Methods

    PubMed Central

    Adhikari, Badri

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch. In this chapter, we briefly discuss many elements of protein residue–residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  16. Protein Residue Contacts and Prediction Methods.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  17. GECluster: a novel protein complex prediction method

    PubMed Central

    Su, Lingtao; Liu, Guixia; Wang, Han; Tian, Yuan; Zhou, Zhihui; Han, Liang; Yan, Lun

    2014-01-01

    Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein–protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly. PMID:26019559

  18. Mass spectrometry-based protein identification with accurate statistical significance assignment

    PubMed Central

    Alves, Gelio; Yu, Yi-Kuo

    2015-01-01

    Motivation: Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. Availability and implementation: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Contact: yyu@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25362092

  19. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction

    PubMed Central

    Singh, Ritambhara; Kuscu, Cem; Quinlan, Aaron; Qi, Yanjun; Adli, Mazhar

    2015-01-01

    The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CRISPR/Cas9 Off-target Prediction and Identification Tool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites. PMID:26032770

  20. Modeling methodology for the accurate and prompt prediction of symptomatic events in chronic diseases.

    PubMed

    Pagán, Josué; Risco-Martín, José L; Moya, José M; Ayala, José L

    2016-08-01

    Prediction of symptomatic crises in chronic diseases allows to take decisions before the symptoms occur, such as the intake of drugs to avoid the symptoms or the activation of medical alarms. The prediction horizon is in this case an important parameter in order to fulfill the pharmacokinetics of medications, or the time response of medical services. This paper presents a study about the prediction limits of a chronic disease with symptomatic crises: the migraine. For that purpose, this work develops a methodology to build predictive migraine models and to improve these predictions beyond the limits of the initial models. The maximum prediction horizon is analyzed, and its dependency on the selected features is studied. A strategy for model selection is proposed to tackle the trade off between conservative but robust predictive models, with respect to less accurate predictions with higher horizons. The obtained results show a prediction horizon close to 40min, which is in the time range of the drug pharmacokinetics. Experiments have been performed in a realistic scenario where input data have been acquired in an ambulatory clinical study by the deployment of a non-intrusive Wireless Body Sensor Network. Our results provide an effective methodology for the selection of the future horizon in the development of prediction algorithms for diseases experiencing symptomatic crises. PMID:27260782

  1. Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts.

    PubMed

    Neal, Stephen; Nip, Alex M; Zhang, Haiyan; Wishart, David S

    2003-07-01

    A computer program (SHIFTX) is described which rapidly and accurately calculates the diamagnetic 1H, 13C and 15N chemical shifts of both backbone and sidechain atoms in proteins. The program uses a hybrid predictive approach that employs pre-calculated, empirically derived chemical shift hypersurfaces in combination with classical or semi-classical equations (for ring current, electric field, hydrogen bond and solvent effects) to calculate 1H, 13C and 15N chemical shifts from atomic coordinates. The chemical shift hypersurfaces capture dihedral angle, sidechain orientation, secondary structure and nearest neighbor effects that cannot easily be translated to analytical formulae or predicted via classical means. The chemical shift hypersurfaces were generated using a database of IUPAC-referenced protein chemical shifts--RefDB (Zhang et al., 2003), and a corresponding set of high resolution (<2.1 A) X-ray structures. Data mining techniques were used to extract the largest pairwise contributors (from a list of approximately 20 derived geometric, sequential and structural parameters) to generate the necessary hypersurfaces. SHIFTX is rapid (<1 CPU second for a complete shift calculation of 100 residues) and accurate. Overall, the program was able to attain a correlation coefficient (r) between observed and calculated shifts of 0.911 (1Halpha), 0.980 (13Calpha), 0.996 (13Cbeta), 0.863 (13CO), 0.909 (15N), 0.741 (1HN), and 0.907 (sidechain 1H) with RMS errors of 0.23, 0.98, 1.10, 1.16, 2.43, 0.49, and 0.30 ppm, respectively on test data sets. We further show that the agreement between observed and SHIFTX calculated chemical shifts can be an extremely sensitive measure of the quality of protein structures. Our results suggest that if NMR-derived structures could be refined using heteronuclear chemical shifts calculated by SHIFTX, their precision could approach that of the highest resolution X-ray structures. SHIFTX is freely available as a web server at http

  2. Protein function prediction based on data fusion and functional interrelationship.

    PubMed

    Meng, Jun; Wekesa, Jael-Sanyanda; Shi, Guan-Li; Luan, Yu-Shi

    2016-04-01

    One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods. PMID:26869536

  3. Mechanism for accurate, protein-assisted DNA annealing by Deinococcus radiodurans DdrB.

    PubMed

    Sugiman-Marangos, Seiji N; Weiss, Yoni M; Junop, Murray S

    2016-04-19

    Accurate pairing of DNA strands is essential for repair of DNA double-strand breaks (DSBs). How cells achieve accurate annealing when large regions of single-strand DNA are unpaired has remained unclear despite many efforts focused on understanding proteins, which mediate this process. Here we report the crystal structure of a single-strand annealing protein [DdrB (DNA damage response B)] in complex with a partially annealed DNA intermediate to 2.2 Å. This structure and supporting biochemical data reveal a mechanism for accurate annealing involving DdrB-mediated proofreading of strand complementarity. DdrB promotes high-fidelity annealing by constraining specific bases from unauthorized association and only releases annealed duplex when bound strands are fully complementary. To our knowledge, this mechanism provides the first understanding for how cells achieve accurate, protein-assisted strand annealing under biological conditions that would otherwise favor misannealing. PMID:27044084

  4. Structure Prediction of Membrane Proteins

    NASA Astrophysics Data System (ADS)

    Hu, Xiche

    Membrane proteins play a central role in many cellular and physiological processes. It is estimated that integral membrane proteins make up about 20-30% of the proteome (Krogh et al., 2001b; Stevens and Arkin, 2000; von Heijne, 1999). They are essential mediators of material and information transfer across cell membranes. Their functions include active and passive transport of molecules into and out of cells and organelles; transduction of energy among various forms (light, electrical, and chemical energy); as well as reception and transduction of chemical and electrical signals across membranes (Avdonin, 2005; Bockaert et al., 2002; Pahl, 1999; Rehling et al., 2004; Stack et al., 1995). Identifying these transmembrane (TM) proteins and deciphering their molecular mechanisms, then, is of great importance, particularly as applied to biomedicine. Membrane proteins are the targets of a large number of pharmacologically and toxicologically active substances, and are directly involved in their uptake, metabolism, and clearance (Bettler et al., 1998; Cohen, 2002; Heusser and Jardieu, 1997; Tibes et al., 2005; Xu et al., 2005). Despite the importance of membrane proteins, the knowledge of their high-resolution structures and mechanisms of action has lagged far behind in comparison to that of water-soluble proteins: less than 1% of all three-dimensional structures deposited in the Protein Data Bank are of membrane proteins. This unfortunate disparity stems from difficulties in overexpression and the crystallization of membrane proteins (Grisshammer and Tate, 1995; Michel, 1991).

  5. Accurate rotor loads prediction using the FLAP (Force and Loads Analysis Program) dynamics code

    SciTech Connect

    Wright, A.D.; Thresher, R.W.

    1987-10-01

    Accurately predicting wind turbine blade loads and response is very important in predicting the fatigue life of wind turbines. There is a clear need in the wind turbine community for validated and user-friendly structural dynamics codes for predicting blade loads and response. At the Solar Energy Research Institute (SERI), a Force and Loads Analysis Program (FLAP) has been refined and validated and is ready for general use. Currently, FLAP is operational on an IBM-PC compatible computer and can be used to analyze both rigid- and teetering-hub configurations. The results of this paper show that FLAP can be used to accurately predict the deterministic loads for rigid-hub rotors. This paper compares analytical predictions to field test measurements for a three-bladed, upwind turbine with a rigid-hub configuration. The deterministic loads predicted by FLAP are compared with 10-min azimuth averages of blade root flapwise bending moments for different wind speeds. 6 refs., 12 figs., 3 tabs.

  6. An accurate modeling, simulation, and analysis tool for predicting and estimating Raman LIDAR system performance

    NASA Astrophysics Data System (ADS)

    Grasso, Robert J.; Russo, Leonard P.; Barrett, John L.; Odhner, Jefferson E.; Egbert, Paul I.

    2007-09-01

    BAE Systems presents the results of a program to model the performance of Raman LIDAR systems for the remote detection of atmospheric gases, air polluting hydrocarbons, chemical and biological weapons, and other molecular species of interest. Our model, which integrates remote Raman spectroscopy, 2D and 3D LADAR, and USAF atmospheric propagation codes permits accurate determination of the performance of a Raman LIDAR system. The very high predictive performance accuracy of our model is due to the very accurate calculation of the differential scattering cross section for the specie of interest at user selected wavelengths. We show excellent correlation of our calculated cross section data, used in our model, with experimental data obtained from both laboratory measurements and the published literature. In addition, the use of standard USAF atmospheric models provides very accurate determination of the atmospheric extinction at both the excitation and Raman shifted wavelengths.

  7. Predicting the dynamics of protein abundance.

    PubMed

    Mehdi, Ahmed M; Patrick, Ralph; Bailey, Timothy L; Bodén, Mikael

    2014-05-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency

  8. Predicting the Dynamics of Protein Abundance

    PubMed Central

    Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael

    2014-01-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation

  9. Accurate Prediction of Ligand Affinities for a Proton-Dependent Oligopeptide Transporter.

    PubMed

    Samsudin, Firdaus; Parker, Joanne L; Sansom, Mark S P; Newstead, Simon; Fowler, Philip W

    2016-02-18

    Membrane transporters are critical modulators of drug pharmacokinetics, efficacy, and safety. One example is the proton-dependent oligopeptide transporter PepT1, also known as SLC15A1, which is responsible for the uptake of the ?-lactam antibiotics and various peptide-based prodrugs. In this study, we modeled the binding of various peptides to a bacterial homolog, PepTSt, and evaluated a range of computational methods for predicting the free energy of binding. Our results show that a hybrid approach (endpoint methods to classify peptides into good and poor binders and a theoretically exact method for refinement) is able to accurately predict affinities, which we validated using proteoliposome transport assays. Applying the method to a homology model of PepT1 suggests that the approach requires a high-quality structure to be accurate. Our study provides a blueprint for extending these computational methodologies to other pharmaceutically important transporter families. PMID:27028887

  10. Accurate Prediction of Ligand Affinities for a Proton-Dependent Oligopeptide Transporter

    PubMed Central

    Samsudin, Firdaus; Parker, Joanne L.; Sansom, Mark S.P.; Newstead, Simon; Fowler, Philip W.

    2016-01-01

    Summary Membrane transporters are critical modulators of drug pharmacokinetics, efficacy, and safety. One example is the proton-dependent oligopeptide transporter PepT1, also known as SLC15A1, which is responsible for the uptake of the β-lactam antibiotics and various peptide-based prodrugs. In this study, we modeled the binding of various peptides to a bacterial homolog, PepTSt, and evaluated a range of computational methods for predicting the free energy of binding. Our results show that a hybrid approach (endpoint methods to classify peptides into good and poor binders and a theoretically exact method for refinement) is able to accurately predict affinities, which we validated using proteoliposome transport assays. Applying the method to a homology model of PepT1 suggests that the approach requires a high-quality structure to be accurate. Our study provides a blueprint for extending these computational methodologies to other pharmaceutically important transporter families. PMID:27028887

  11. A Single Linear Prediction Filter that Accurately Predicts the AL Index

    NASA Astrophysics Data System (ADS)

    McPherron, R. L.; Chu, X.

    2015-12-01

    The AL index is a measure of the strength of the westward electrojet flowing along the auroral oval. It has two components: one from the global DP-2 current system and a second from the DP-1 current that is more localized near midnight. It is generally believed that the index a very poor measure of these currents because of its dependence on the distance of stations from the source of the two currents. In fact over season and solar cycle the coupling strength defined as the steady state ratio of the output AL to the input coupling function varies by a factor of four. There are four factors that lead to this variation. First is the equinoctial effect that modulates coupling strength with peaks (strongest coupling) at the equinoxes. Second is the saturation of the polar cap potential which decreases coupling strength as the strength of the driver increases. Since saturation occurs more frequently at solar maximum we obtain the result that maximum coupling strength occurs at equinox at solar minimum. A third factor is ionospheric conductivity with stronger coupling at summer solstice as compared to winter. The fourth factor is the definition of a solar wind coupling function appropriate to a given index. We have developed an optimum coupling function depending on solar wind speed, density, transverse magnetic field, and IMF clock angle which is better than previous functions. Using this we have determined the seasonal variation of coupling strength and developed an inverse function that modulates the optimum coupling function so that all seasonal variation is removed. In a similar manner we have determined the dependence of coupling strength on solar wind driver strength. The inverse of this function is used to scale a linear prediction filter thus eliminating the dependence on driver strength. Our result is a single linear filter that is adjusted in a nonlinear manner by driver strength and an optimum coupling function that is seasonal modulated. Together this

  12. A review of the kinetic detail required for accurate predictions of normal shock waves

    NASA Technical Reports Server (NTRS)

    Muntz, E. P.; Erwin, Daniel A.; Pham-Van-diep, Gerald C.

    1991-01-01

    Several aspects of the kinetic models used in the collision phase of Monte Carlo direct simulations have been studied. Accurate molecular velocity distribution function predictions require a significantly increased number of computational cells in one maximum slope shock thickness, compared to predictions of macroscopic properties. The shape of the highly repulsive portion of the interatomic potential for argon is not well modeled by conventional interatomic potentials; this portion of the potential controls high Mach number shock thickness predictions, indicating that the specification of the energetic repulsive portion of interatomic or intermolecular potentials must be chosen with care for correct modeling of nonequilibrium flows at high temperatures. It has been shown for inverse power potentials that the assumption of variable hard sphere scattering provides accurate predictions of the macroscopic properties in shock waves, by comparison with simulations in which differential scattering is employed in the collision phase. On the other hand, velocity distribution functions are not well predicted by the variable hard sphere scattering model for softer potentials at higher Mach numbers.

  13. Can phenological models predict tree phenology accurately under climate change conditions?

    NASA Astrophysics Data System (ADS)

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean Michel; García de Cortázar-Atauri, Inaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2014-05-01

    The onset of the growing season of trees has been globally earlier by 2.3 days/decade during the last 50 years because of global warming and this trend is predicted to continue according to climate forecast. The effect of temperature on plant phenology is however not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud dormancy, and on the other hand higher temperatures are necessary to promote bud cells growth afterwards. Increasing phenological changes in temperate woody species have strong impacts on forest trees distribution and productivity, as well as crops cultivation areas. Accurate predictions of trees phenology are therefore a prerequisite to understand and foresee the impacts of climate change on forests and agrosystems. Different process-based models have been developed in the last two decades to predict the date of budburst or flowering of woody species. They are two main families: (1) one-phase models which consider only the ecodormancy phase and make the assumption that endodormancy is always broken before adequate climatic conditions for cell growth occur; and (2) two-phase models which consider both the endodormancy and ecodormancy phases and predict a date of dormancy break which varies from year to year. So far, one-phase models have been able to predict accurately tree bud break and flowering under historical climate. However, because they do not consider what happens prior to ecodormancy, and especially the possible negative effect of winter temperature warming on dormancy break, it seems unlikely that they can provide accurate predictions in future climate conditions. It is indeed well known that a lack of low temperature results in abnormal pattern of bud break and development in temperate fruit trees. An accurate modelling of the dormancy break date has thus become a major issue in phenology modelling. Two-phases phenological models predict that global warming should delay

  14. Can phenological models predict tree phenology accurately in the future? The unrevealed hurdle of endodormancy break.

    PubMed

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean-Michel; García de Cortázar-Atauri, Iñaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2016-10-01

    The onset of the growing season of trees has been earlier by 2.3 days per decade during the last 40 years in temperate Europe because of global warming. The effect of temperature on plant phenology is, however, not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud endodormancy, and, on the other hand, higher temperatures are necessary to promote bud cell growth afterward. Different process-based models have been developed in the last decades to predict the date of budbreak of woody species. They predict that global warming should delay or compromise endodormancy break at the species equatorward range limits leading to a delay or even impossibility to flower or set new leaves. These models are classically parameterized with flowering or budbreak dates only, with no information on the endodormancy break date because this information is very scarce. Here, we evaluated the efficiency of a set of phenological models to accurately predict the endodormancy break dates of three fruit trees. Our results show that models calibrated solely with budbreak dates usually do not accurately predict the endodormancy break date. Providing endodormancy break date for the model parameterization results in much more accurate prediction of this latter, with, however, a higher error than that on budbreak dates. Most importantly, we show that models not calibrated with endodormancy break dates can generate large discrepancies in forecasted budbreak dates when using climate scenarios as compared to models calibrated with endodormancy break dates. This discrepancy increases with mean annual temperature and is therefore the strongest after 2050 in the southernmost regions. Our results claim for the urgent need of massive measurements of endodormancy break dates in forest and fruit trees to yield more robust projections of phenological changes in a near future. PMID:27272707

  15. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  16. Local backbone structure prediction of proteins.

    PubMed

    de Brevern, Alexandre G; Benros, Cristina; Gautier, Romain; Valadié, Héléne; Hazout, Serge; Etchebest, Catherine

    2004-01-01

    A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288

  17. Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees.

    PubMed

    Groussin, Mathieu; Hobbs, Joanne K; Szöllősi, Gergely J; Gribaldo, Simonetta; Arcus, Vickery L; Gouy, Manolo

    2015-01-01

    The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify. PMID:25371435

  18. Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions

    PubMed Central

    2015-01-01

    Background Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9% to 76.1% correct. Conclusions We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release. PMID:25881257

  19. Highly Accurate Structure-Based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism

    PubMed Central

    Kieslich, Chris A.; Tamamis, Phanourios; Guzman, Yannis A.; Onel, Melis; Floudas, Christodoulos A.

    2016-01-01

    HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/. PMID:26859389

  20. Accurate similarity index based on activity and connectivity of node for link prediction

    NASA Astrophysics Data System (ADS)

    Li, Longjie; Qian, Lvjian; Wang, Xiaoping; Luo, Shishun; Chen, Xiaoyun

    2015-05-01

    Recent years have witnessed the increasing of available network data; however, much of those data is incomplete. Link prediction, which can find the missing links of a network, plays an important role in the research and analysis of complex networks. Based on the assumption that two unconnected nodes which are highly similar are very likely to have an interaction, most of the existing algorithms solve the link prediction problem by computing nodes' similarities. The fundamental requirement of those algorithms is accurate and effective similarity indices. In this paper, we propose a new similarity index, namely similarity based on activity and connectivity (SAC), which performs link prediction more accurately. To compute the similarity between two nodes, this index employs the average activity of these two nodes in their common neighborhood and the connectivities between them and their common neighbors. The higher the average activity is and the stronger the connectivities are, the more similar the two nodes are. The proposed index not only commendably distinguishes the contributions of paths but also incorporates the influence of endpoints. Therefore, it can achieve a better predicting result. To verify the performance of SAC, we conduct experiments on 10 real-world networks. Experimental results demonstrate that SAC outperforms the compared baselines.

  1. Highly accurate prediction of emotions surrounding the attacks of September 11, 2001 over 1-, 2-, and 7-year prediction intervals.

    PubMed

    Doré, Bruce P; Meksin, Robert; Mather, Mara; Hirst, William; Ochsner, Kevin N

    2016-06-01

    In the aftermath of a national tragedy, important decisions are predicated on judgments of the emotional significance of the tragedy in the present and future. Research in affective forecasting has largely focused on ways in which people fail to make accurate predictions about the nature and duration of feelings experienced in the aftermath of an event. Here we ask a related but understudied question: can people forecast how they will feel in the future about a tragic event that has already occurred? We found that people were strikingly accurate when predicting how they would feel about the September 11 attacks over 1-, 2-, and 7-year prediction intervals. Although people slightly under- or overestimated their future feelings at times, they nonetheless showed high accuracy in forecasting (a) the overall intensity of their future negative emotion, and (b) the relative degree of different types of negative emotion (i.e., sadness, fear, or anger). Using a path model, we found that the relationship between forecasted and actual future emotion was partially mediated by current emotion and remembered emotion. These results extend theories of affective forecasting by showing that emotional responses to an event of ongoing national significance can be predicted with high accuracy, and by identifying current and remembered feelings as independent sources of this accuracy. (PsycINFO Database Record PMID:27100309

  2. Signature Product Code for Predicting Protein-Protein Interactions

    SciTech Connect

    Martin, Shawn B.; Brown, William M.

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictions about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.

  3. Protein short loop prediction in terms of a structural alphabet.

    PubMed

    Tyagi, Manoj; Bornot, Aurélie; Offmann, Bernard; de Brevern, Alexandre G

    2009-08-01

    Loops connect regular secondary structures. In many instances, they are known to play crucial biological roles. To bypass the limitation of secondary structure description, we previously defined a structural alphabet composed of 16 structural prototypes, called Protein Blocks (PBs). It leads to an accurate description of every region of 3D protein backbones and has been used in local structure prediction. In the present study, we used our structural alphabet to predict the loops connecting two repetitive structures. Thus, we showed interest to take into account the flanking regions, leading to prediction rate improvement up to 19.8%, but we also underline the sensitivity of such an approach. This research can be used to propose different structures for the loops and to probe and sample their flexibility. It is a useful tool for ab initio loop prediction and leads to insights into flexible docking approach. PMID:19625218

  4. Prediction of Accurate Thermochemistry of Medium and Large Sized Radicals Using Connectivity-Based Hierarchy (CBH).

    PubMed

    Sengupta, Arkajyoti; Raghavachari, Krishnan

    2014-10-14

    Accurate modeling of the chemical reactions in many diverse areas such as combustion, photochemistry, or atmospheric chemistry strongly depends on the availability of thermochemical information of the radicals involved. However, accurate thermochemical investigations of radical systems using state of the art composite methods have mostly been restricted to the study of hydrocarbon radicals of modest size. In an alternative approach, systematic error-canceling thermochemical hierarchy of reaction schemes can be applied to yield accurate results for such systems. In this work, we have extended our connectivity-based hierarchy (CBH) method to the investigation of radical systems. We have calibrated our method using a test set of 30 medium sized radicals to evaluate their heats of formation. The CBH-rad30 test set contains radicals containing diverse functional groups as well as cyclic systems. We demonstrate that the sophisticated error-canceling isoatomic scheme (CBH-2) with modest levels of theory is adequate to provide heats of formation accurate to ∼1.5 kcal/mol. Finally, we predict heats of formation of 19 other large and medium sized radicals for which the accuracy of available heats of formation are less well-known. PMID:26588131

  5. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  6. Planar Near-Field Phase Retrieval Using GPUs for Accurate THz Far-Field Prediction

    NASA Astrophysics Data System (ADS)

    Junkin, Gary

    2013-04-01

    With a view to using Phase Retrieval to accurately predict Terahertz antenna far-field from near-field intensity measurements, this paper reports on three fundamental advances that achieve very low algorithmic error penalties. The first is a new Gaussian beam analysis that provides accurate initial complex aperture estimates including defocus and astigmatic phase errors, based only on first and second moment calculations. The second is a powerful noise tolerant near-field Phase Retrieval algorithm that combines Anderson's Plane-to-Plane (PTP) with Fienup's Hybrid-Input-Output (HIO) and Successive Over-Relaxation (SOR) to achieve increased accuracy at reduced scan separations. The third advance employs teraflop Graphical Processing Units (GPUs) to achieve practically real time near-field phase retrieval and to obtain the optimum aperture constraint without any a priori information.

  7. Predicting protein-protein interactions based only on sequences information.

    PubMed

    Shen, Juwen; Zhang, Jian; Luo, Xiaomin; Zhu, Weiliang; Yu, Kunqian; Chen, Kaixian; Li, Yixue; Jiang, Hualiang

    2007-03-13

    Protein-protein interactions (PPIs) are central to most biological processes. Although efforts have been devoted to the development of methodology for predicting PPIs and protein interaction networks, the application of most existing methods is limited because they need information about protein homology or the interaction marks of the protein partners. In the present work, we propose a method for PPI prediction using only the information of protein sequences. This method was developed based on a learning algorithm-support vector machine combined with a kernel function and a conjoint triad feature for describing amino acids. More than 16,000 diverse PPI pairs were used to construct the universal model. The prediction ability of our approach is better than that of other sequence-based PPI prediction methods because it is able to predict PPI networks. Different types of PPI networks have been effectively mapped with our method, suggesting that, even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. In addition, such supplementary experimental information can enhance the prediction ability of the method. PMID:17360525

  8. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    SciTech Connect

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.; Arkin, Adam P.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

  9. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    DOE PAGESBeta

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O. Anatole; Müller, Klaus -Robert; Tkatchenko, Alexandre

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstratemore » prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.« less

  10. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    SciTech Connect

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O. Anatole; Müller, Klaus -Robert; Tkatchenko, Alexandre

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

  11. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    PubMed

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/. PMID:26894674

  12. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations

    PubMed Central

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-01-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-‘one-click’ experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/. PMID:26894674

  13. Development and Validation of a Multidisciplinary Tool for Accurate and Efficient Rotorcraft Noise Prediction (MUTE)

    NASA Technical Reports Server (NTRS)

    Liu, Yi; Anusonti-Inthra, Phuriwat; Diskin, Boris

    2011-01-01

    A physics-based, systematically coupled, multidisciplinary prediction tool (MUTE) for rotorcraft noise was developed and validated with a wide range of flight configurations and conditions. MUTE is an aggregation of multidisciplinary computational tools that accurately and efficiently model the physics of the source of rotorcraft noise, and predict the noise at far-field observer locations. It uses systematic coupling approaches among multiple disciplines including Computational Fluid Dynamics (CFD), Computational Structural Dynamics (CSD), and high fidelity acoustics. Within MUTE, advanced high-order CFD tools are used around the rotor blade to predict the transonic flow (shock wave) effects, which generate the high-speed impulsive noise. Predictions of the blade-vortex interaction noise in low speed flight are also improved by using the Particle Vortex Transport Method (PVTM), which preserves the wake flow details required for blade/wake and fuselage/wake interactions. The accuracy of the source noise prediction is further improved by utilizing a coupling approach between CFD and CSD, so that the effects of key structural dynamics, elastic blade deformations, and trim solutions are correctly represented in the analysis. The blade loading information and/or the flow field parameters around the rotor blade predicted by the CFD/CSD coupling approach are used to predict the acoustic signatures at far-field observer locations with a high-fidelity noise propagation code (WOPWOP3). The predicted results from the MUTE tool for rotor blade aerodynamic loading and far-field acoustic signatures are compared and validated with a variation of experimental data sets, such as UH60-A data, DNW test data and HART II test data.

  14. Systematic computational prediction of protein interaction networks.

    PubMed

    Lees, J G; Heriche, J K; Morilla, I; Ranea, J A; Orengo, C A

    2011-06-01

    Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use. PMID:21572181

  15. Accurate Prediction of Severe Allergic Reactions by a Small Set of Environmental Parameters (NDVI, Temperature)

    PubMed Central

    Andrianaki, Maria; Azariadis, Kalliopi; Kampouri, Errika; Theodoropoulou, Katerina; Lavrentaki, Katerina; Kastrinakis, Stelios; Kampa, Marilena; Agouridakis, Panagiotis; Pirintsos, Stergios; Castanas, Elias

    2015-01-01

    Severe allergic reactions of unknown etiology,necessitating a hospital visit, have an important impact in the life of affected individuals and impose a major economic burden to societies. The prediction of clinically severe allergic reactions would be of great importance, but current attempts have been limited by the lack of a well-founded applicable methodology and the wide spatiotemporal distribution of allergic reactions. The valid prediction of severe allergies (and especially those needing hospital treatment) in a region, could alert health authorities and implicated individuals to take appropriate preemptive measures. In the present report we have collecterd visits for serious allergic reactions of unknown etiology from two major hospitals in the island of Crete, for two distinct time periods (validation and test sets). We have used the Normalized Difference Vegetation Index (NDVI), a satellite-based, freely available measurement, which is an indicator of live green vegetation at a given geographic area, and a set of meteorological data to develop a model capable of describing and predicting severe allergic reaction frequency. Our analysis has retained NDVI and temperature as accurate identifiers and predictors of increased hospital severe allergic reactions visits. Our approach may contribute towards the development of satellite-based modules, for the prediction of severe allergic reactions in specific, well-defined geographical areas. It could also probably be used for the prediction of other environment related diseases and conditions. PMID:25794106

  16. Microstructure-Dependent Gas Adsorption: Accurate Predictions of Methane Uptake in Nanoporous Carbons

    SciTech Connect

    Ihm, Yungok; Cooper, Valentino R; Gallego, Nidia C; Contescu, Cristian I; Morris, James R

    2014-01-01

    We demonstrate a successful, efficient framework for predicting gas adsorption properties in real materials based on first-principles calculations, with a specific comparison of experiment and theory for methane adsorption in activated carbons. These carbon materials have different pore size distributions, leading to a variety of uptake characteristics. Utilizing these distributions, we accurately predict experimental uptakes and heats of adsorption without empirical potentials or lengthy simulations. We demonstrate that materials with smaller pores have higher heats of adsorption, leading to a higher gas density in these pores. This pore-size dependence must be accounted for, in order to predict and understand the adsorption behavior. The theoretical approach combines: (1) ab initio calculations with a van der Waals density functional to determine adsorbent-adsorbate interactions, and (2) a thermodynamic method that predicts equilibrium adsorption densities by directly incorporating the calculated potential energy surface in a slit pore model. The predicted uptake at P=20 bar and T=298 K is in excellent agreement for all five activated carbon materials used. This approach uses only the pore-size distribution as an input, with no fitting parameters or empirical adsorbent-adsorbate interactions, and thus can be easily applied to other adsorbent-adsorbate combinations.

  17. Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network

    NASA Astrophysics Data System (ADS)

    Ben Ali, Jaouher; Chebel-Morello, Brigitte; Saidi, Lotfi; Malinowski, Simon; Fnaiech, Farhat

    2015-05-01

    Accurate remaining useful life (RUL) prediction of critical assets is an important challenge in condition based maintenance to improve reliability and decrease machine's breakdown and maintenance's cost. Bearing is one of the most important components in industries which need to be monitored and the user should predict its RUL. The challenge of this study is to propose an original feature able to evaluate the health state of bearings and to estimate their RUL by Prognostics and Health Management (PHM) techniques. In this paper, the proposed method is based on the data-driven prognostic approach. The combination of Simplified Fuzzy Adaptive Resonance Theory Map (SFAM) neural network and Weibull distribution (WD) is explored. WD is used just in the training phase to fit measurement and to avoid areas of fluctuation in the time domain. SFAM training process is based on fitted measurements at present and previous inspection time points as input. However, the SFAM testing process is based on real measurements at present and previous inspections. Thanks to the fuzzy learning process, SFAM has an important ability and a good performance to learn nonlinear time series. As output, seven classes are defined; healthy bearing and six states for bearing degradation. In order to find the optimal RUL prediction, a smoothing phase is proposed in this paper. Experimental results show that the proposed method can reliably predict the RUL of rolling element bearings (REBs) based on vibration signals. The proposed prediction approach can be applied to prognostic other various mechanical assets.

  18. Special purpose hybrid transfinite elements and unified computational methodology for accurately predicting thermoelastic stress waves

    NASA Technical Reports Server (NTRS)

    Tamma, Kumar K.; Railkar, Sudhir B.

    1988-01-01

    This paper represents an attempt to apply extensions of a hybrid transfinite element computational approach for accurately predicting thermoelastic stress waves. The applicability of the present formulations for capturing the thermal stress waves induced by boundary heating for the well known Danilovskaya problems is demonstrated. A unique feature of the proposed formulations for applicability to the Danilovskaya problem of thermal stress waves in elastic solids lies in the hybrid nature of the unified formulations and the development of special purpose transfinite elements in conjunction with the classical Galerkin techniques and transformation concepts. Numerical test cases validate the applicability and superior capability to capture the thermal stress waves induced due to boundary heating.

  19. Accurate verification of the conserved-vector-current and standard-model predictions

    SciTech Connect

    Sirlin, A.; Zucchini, R.

    1986-10-20

    An approximate analytic calculation of O(Z..cap alpha../sup 2/) corrections to Fermi decays is presented. When the analysis of Koslowsky et al. is modified to take into account the new results, it is found that each of the eight accurately studied scrFt values differs from the average by approx. <1sigma, thus significantly improving the comparison of experiments with conserved-vector-current predictions. The new scrFt values are lower than before, which also brings experiments into very good agreement with the three-generation standard model, at the level of its quantum corrections.

  20. Effective protein conformational sampling based on predicted torsion angles.

    PubMed

    Yang, Yuedong; Zhou, Yaoqi

    2016-04-30

    Protein structure prediction is a long-standing problem in molecular biology. Due to lack of an accurate energy function, it is often difficult to know whether the sampling algorithm or the energy function is the most important factor for failure of locating near-native conformations of proteins. This article examines the size dependence of sampling effectiveness by using a perfect "energy function": the root-mean-squared distance from the target native structure. Using protein targets up to 460 residues from critical assessment of structure prediction techniques (CASP11, 2014), we show that the accuracy of near native structures sampled is relatively independent of protein sizes but strongly depends on the errors of predicted torsion angles. Even with 40% out-of-range angle prediction, 2 Å or less near-native conformation can be sampled. The result supports that the poor energy function is one of the bottlenecks of structure prediction and predicted torsion angles are useful for overcoming the bottleneck by restricting the sampling space in the absence of a perfect energy function. © 2015 Wiley Periodicals, Inc. PMID:26696379

  1. Year 2 Report: Protein Function Prediction Platform

    SciTech Connect

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  2. Predicting protein-peptide interactions from scratch

    NASA Astrophysics Data System (ADS)

    Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin; Zou lab Team

    Protein-peptide interactions play an important role in many cellular processes. The ability to predict protein-peptide complex structures is valuable for mechanistic investigation and therapeutic development. Due to the high flexibility of peptides and lack of templates for homologous modeling, predicting protein-peptide complex structures is extremely challenging. Recently, we have developed a novel docking framework for protein-peptide structure prediction. Specifically, given the sequence of a peptide and a 3D structure of the protein, initial conformations of the peptide are built through protein threading. Then, the peptide is globally and flexibly docked onto the protein using a novel iterative approach. Finally, the sampled modes are scored and ranked by a statistical potential-based energy scoring function that was derived for protein-peptide interactions from statistical mechanics principles. Our docking methodology has been tested on the Peptidb database and compared with other protein-peptide docking methods. Systematic analysis shows significantly improved results compared to the performances of the existing methods. Our method is computationally efficient and suitable for large-scale applications. Nsf CAREER Award 0953839 (XZ) NIH R01GM109980 (XZ).

  3. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-01-01

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate. PMID:26782400

  4. Signature Product Code for Predicting Protein-Protein Interactions

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictionsmore » about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.« less

  5. High Order Schemes in Bats-R-US for Faster and More Accurate Predictions

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Toth, G.; Gombosi, T. I.

    2014-12-01

    BATS-R-US is a widely used global magnetohydrodynamics model that originally employed second order accurate TVD schemes combined with block based Adaptive Mesh Refinement (AMR) to achieve high resolution in the regions of interest. In the last years we have implemented fifth order accurate finite difference schemes CWENO5 and MP5 for uniform Cartesian grids. Now the high order schemes have been extended to generalized coordinates, including spherical grids and also to the non-uniform AMR grids including dynamic regridding. We present numerical tests that verify the preservation of free-stream solution and high-order accuracy as well as robust oscillation-free behavior near discontinuities. We apply the new high order accurate schemes to both heliospheric and magnetospheric simulations and show that it is robust and can achieve the same accuracy as the second order scheme with much less computational resources. This is especially important for space weather prediction that requires faster than real time code execution.

  6. Integrative subcellular proteomic analysis allows accurate prediction of human disease-causing genes.

    PubMed

    Zhao, Li; Chen, Yiyun; Bajaj, Amol Onkar; Eblimit, Aiden; Xu, Mingchu; Soens, Zachry T; Wang, Feng; Ge, Zhongqi; Jung, Sung Yun; He, Feng; Li, Yumei; Wensel, Theodore G; Qin, Jun; Chen, Rui

    2016-05-01

    Proteomic profiling on subcellular fractions provides invaluable information regarding both protein abundance and subcellular localization. When integrated with other data sets, it can greatly enhance our ability to predict gene function genome-wide. In this study, we performed a comprehensive proteomic analysis on the light-sensing compartment of photoreceptors called the outer segment (OS). By comparing with the protein profile obtained from the retina tissue depleted of OS, an enrichment score for each protein is calculated to quantify protein subcellular localization, and 84% accuracy is achieved compared with experimental data. By integrating the protein OS enrichment score, the protein abundance, and the retina transcriptome, the probability of a gene playing an essential function in photoreceptor cells is derived with high specificity and sensitivity. As a result, a list of genes that will likely result in human retinal disease when mutated was identified and validated by previous literature and/or animal model studies. Therefore, this new methodology demonstrates the synergy of combining subcellular fractionation proteomics with other omics data sets and is generally applicable to other tissues and diseases. PMID:26912414

  7. Base-resolution methylation patterns accurately predict transcription factor bindings in vivo

    PubMed Central

    Xu, Tianlei; Li, Ben; Zhao, Meng; Szulwach, Keith E.; Street, R. Craig; Lin, Li; Yao, Bing; Zhang, Feiran; Jin, Peng; Wu, Hao; Qin, Zhaohui S.

    2015-01-01

    Detecting in vivo transcription factor (TF) binding is important for understanding gene regulatory circuitries. ChIP-seq is a powerful technique to empirically define TF binding in vivo. However, the multitude of distinct TFs makes genome-wide profiling for them all labor-intensive and costly. Algorithms for in silico prediction of TF binding have been developed, based mostly on histone modification or DNase I hypersensitivity data in conjunction with DNA motif and other genomic features. However, technical limitations of these methods prevent them from being applied broadly, especially in clinical settings. We conducted a comprehensive survey involving multiple cell lines, TFs, and methylation types and found that there are intimate relationships between TF binding and methylation level changes around the binding sites. Exploiting the connection between DNA methylation and TF binding, we proposed a novel supervised learning approach to predict TF–DNA interaction using data from base-resolution whole-genome methylation sequencing experiments. We devised beta-binomial models to characterize methylation data around TF binding sites and the background. Along with other static genomic features, we adopted a random forest framework to predict TF–DNA interaction. After conducting comprehensive tests, we saw that the proposed method accurately predicts TF binding and performs favorably versus competing methods. PMID:25722376

  8. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  9. Prediction of protein structural classes and subcellular locations.

    PubMed

    Chou, K C

    2000-09-01

    The structural class and subcellular location are the two important features of proteins that are closely related to their biological functions. With the rapid increase in new protein sequences entering into data banks, it is highly desirable to develop a fast and accurate method for predicting the attributes of these features for them. This can expedite the functionality determination of new proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Various prediction methods have been developed during the last two decades. This review is devoted to presenting a systematic introduction and comparison of the existing methods in respect to the prediction algorithm and classification scheme. The attention is focused on the state-of-the-art, which is featured by the covarient-discriminant algorithm developed very recently, as well as some new classification schemes for protein structural classes and subcellular locations. Particularly, addressed are also the physical chemistry foundation of the existing prediction methods, and the essence why the covariant-discriminant algorithm is so powerful. PMID:12369916

  10. Multipass Membrane Protein Structure Prediction Using Rosetta

    PubMed Central

    Yarov-Yarovoy, Vladimir; Schonbrun, Jack; Baker, David

    2006-01-01

    We describe the adaptation of the Rosetta de novo structure prediction method for prediction of helical transmembrane protein structures. The membrane environment is modeled by embedding the protein chain into a model membrane represented by parallel planes defining hydrophobic, interface, and polar membrane layers for each energy evaluation. The optimal embedding is determined by maximizing the exposure of surface hydrophobic residues within the membrane and minimizing hydrophobic exposure outside of the membrane. Protein conformations are built up using the Rosetta fragment assembly method and evaluated using a new membrane-specific version of the Rosetta low-resolution energy function in which residue–residue and residue–environment interactions are functions of the membrane layer in addition to amino acid identity, distance, and density. We find that lower energy and more native-like structures are achieved by sequential addition of helices to a growing chain, which may mimic some aspects of helical protein biogenesis after translocation, rather than folding the whole chain simultaneously as in the Rosetta soluble protein prediction method. In tests on 12 membrane proteins for which the structure is known, between 51 and 145 residues were predicted with root-mean-square deviation <4Å from the native structure. PMID:16372357

  11. CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

    PubMed Central

    Karim, Rezaul; Aziz, Mohd. Momin Al; Shatabda, Swakkhar; Rahman, M. Sohel; Mia, Md. Abul Kashem; Zaman, Farhana; Rakin, Salman

    2015-01-01

    The number of entries in a structural database of proteins is increasing day by day. Methods for retrieving protein tertiary structures from such a large database have turn out to be the key to comparative analysis of structures that plays an important role to understand proteins and their functions. In this paper, we present fast and accurate methods for the retrieval of proteins having tertiary structures similar to a query protein from a large database. Our proposed methods borrow ideas from the field of computer vision. The speed and accuracy of our methods come from the two newly introduced features- the co-occurrence matrix of the oriented gradient and pyramid histogram of oriented gradient- and the use of Euclidean distance as the distance measure. Experimental results clearly indicate the superiority of our approach in both running time and accuracy. Our method is readily available for use from this website: http://research.buet.ac.bd:8080/Comograd/. PMID:26293226

  12. CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval.

    PubMed

    Karim, Rezaul; Aziz, Mohd Momin Al; Shatabda, Swakkhar; Rahman, M Sohel; Mia, Md Abul Kashem; Zaman, Farhana; Rakin, Salman

    2015-01-01

    The number of entries in a structural database of proteins is increasing day by day. Methods for retrieving protein tertiary structures from such a large database have turn out to be the key to comparative analysis of structures that plays an important role to understand proteins and their functions. In this paper, we present fast and accurate methods for the retrieval of proteins having tertiary structures similar to a query protein from a large database. Our proposed methods borrow ideas from the field of computer vision. The speed and accuracy of our methods come from the two newly introduced features- the co-occurrence matrix of the oriented gradient and pyramid histogram of oriented gradient- and the use of Euclidean distance as the distance measure. Experimental results clearly indicate the superiority of our approach in both running time and accuracy. Our method is readily available for use from this website: http://research.buet.ac.bd:8080/Comograd/. PMID:26293226

  13. Reduced alphabet for protein folding prediction.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-04-01

    What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. PMID:25641420

  14. JPred4: a protein secondary structure prediction server.

    PubMed

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J

    2015-07-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  15. JPred4: a protein secondary structure prediction server

    PubMed Central

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J.

    2015-01-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  16. VORFFIP-driven dock: V-D2OCK, a fast and accurate protein docking strategy.

    PubMed

    Segura, Joan; Marín-López, Manuel Alejandro; Jones, Pamela F; Oliva, Baldo; Fernandez-Fuentes, Narcis

    2015-01-01

    The experimental determination of the structure of protein complexes cannot keep pace with the generation of interactomic data, hence resulting in an ever-expanding gap. As the structural details of protein complexes are central to a full understanding of the function and dynamics of the cell machinery, alternative strategies are needed to circumvent the bottleneck in structure determination. Computational protein docking is a valid and valuable approach to model the structure of protein complexes. In this work, we describe a novel computational strategy to predict the structure of protein complexes based on data-driven docking: VORFFIP-driven dock (V-D2OCK). This new approach makes use of our newly described method to predict functional sites in protein structures, VORFFIP, to define the region to be sampled during docking and structural clustering to reduce the number of models to be examined by users. V-D2OCK has been benchmarked using a validated and diverse set of protein complexes and compared to a state-of-art docking method. The speed and accuracy compared to contemporary tools justifies the potential use of VD2OCK for high-throughput, genome-wide, protein docking. Finally, we have developed a web interface that allows users to browser and visualize V-D2OCK predictions from the convenience of their web-browsers. PMID:25763838

  17. VORFFIP-Driven Dock: V-D2OCK, a Fast and Accurate Protein Docking Strategy

    PubMed Central

    Segura, Joan; Marín-López, Manuel Alejandro; Jones, Pamela F.; Oliva, Baldo; Fernandez-Fuentes, Narcis

    2015-01-01

    The experimental determination of the structure of protein complexes cannot keep pace with the generation of interactomic data, hence resulting in an ever-expanding gap. As the structural details of protein complexes are central to a full understanding of the function and dynamics of the cell machinery, alternative strategies are needed to circumvent the bottleneck in structure determination. Computational protein docking is a valid and valuable approach to model the structure of protein complexes. In this work, we describe a novel computational strategy to predict the structure of protein complexes based on data-driven docking: VORFFIP-driven dock (V-D2OCK). This new approach makes use of our newly described method to predict functional sites in protein structures, VORFFIP, to define the region to be sampled during docking and structural clustering to reduce the number of models to be examined by users. V-D2OCK has been benchmarked using a validated and diverse set of protein complexes and compared to a state-of-art docking method. The speed and accuracy compared to contemporary tools justifies the potential use of VD2OCK for high-throughput, genome-wide, protein docking. Finally, we have developed a web interface that allows users to browser and visualize V-D2OCK predictions from the convenience of their web-browsers. PMID:25763838

  18. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  19. A hierarchical approach to accurate predictions of macroscopic thermodynamic behavior from quantum mechanics and molecular simulations

    NASA Astrophysics Data System (ADS)

    Garrison, Stephen L.

    2005-07-01

    The combination of molecular simulations and potentials obtained from quantum chemistry is shown to be able to provide reasonably accurate thermodynamic property predictions. Gibbs ensemble Monte Carlo simulations are used to understand the effects of small perturbations to various regions of the model Lennard-Jones 12-6 potential. However, when the phase behavior and second virial coefficient are scaled by the critical properties calculated for each potential, the results obey a corresponding states relation suggesting a non-uniqueness problem for interaction potentials fit to experimental phase behavior. Several variations of a procedure collectively referred to as quantum mechanical Hybrid Methods for Interaction Energies (HM-IE) are developed and used to accurately estimate interaction energies from CCSD(T) calculations with a large basis set in a computationally efficient manner for the neon-neon, acetylene-acetylene, and nitrogen-benzene systems. Using these results and methods, an ab initio, pairwise-additive, site-site potential for acetylene is determined and then improved using results from molecular simulations using this initial potential. The initial simulation results also indicate that a limited range of energies important for accurate phase behavior predictions. Second virial coefficients calculated from the improved potential indicate that one set of experimental data in the literature is likely erroneous. This prescription is then applied to methanethiol. Difficulties in modeling the effects of the lone pair electrons suggest that charges on the lone pair sites negatively impact the ability of the intermolecular potential to describe certain orientations, but that the lone pair sites may be necessary to reasonably duplicate the interaction energies for several orientations. Two possible methods for incorporating the effects of three-body interactions into simulations within the pairwise-additivity formulation are also developed. A low density

  20. Highly ordered protein nanorings designed by accurate control of glutathione S-transferase self-assembly.

    PubMed

    Bai, Yushi; Luo, Quan; Zhang, Wei; Miao, Lu; Xu, Jiayun; Li, Hongbin; Liu, Junqiu

    2013-07-31

    Protein self-assembly into exquisite, complex, yet highly ordered architectures represents the supreme wisdom of nature. However, precise manipulation of protein self-assembly behavior in vitro is a great challenge. Here we report that by taking advantage of the cooperation of metal-ion-chelating interactions and nonspecific protein-protein interactions, we achieved accurate control of the orientation of proteins and their self-assembly into protein nanorings. As a building block, we utilized the C2-symmetric protein sjGST-2His, a variant of glutathione S-transferase from Schistosoma japonicum having two properly oriented His metal-chelating sites on the surface. Through synergic metal-coordination and non-covalent interactions, sjGST-2His self-assembled in a fixed bending manner to form highly ordered protein nanorings. The diameters of the nanorings can be regulated by tuning the strength of the non-covalent interaction network between sjGST-2His interfaces through variation of the ionic strength of the solution. This work provides a de novo design strategy that can be applied in the construction of novel protein superstructures. PMID:23865524

  1. PSAQ™ standards for accurate MS-based quantification of proteins: from the concept to biomedical applications.

    PubMed

    Picard, Guillaume; Lebert, Dorothée; Louwagie, Mathilde; Adrait, Annie; Huillet, Céline; Vandenesch, François; Bruley, Christophe; Garin, Jérôme; Jaquinod, Michel; Brun, Virginie

    2012-10-01

    Absolute protein quantification, i.e. determining protein concentrations in biological samples, is essential to our understanding of biological and physiopathological phenomena. Protein quantification methods based on the use of antibodies are very effective and widely used. However, over the last ten years, absolute protein quantification by mass spectrometry has attracted considerable interest, particularly for the study of systems biology and as part of biomarker development. This interest is mainly linked to the high multiplexing capacity of MS analysis, and to the availability of stable-isotope-labelled standards for quantification. This article describes the details of how to produce, control the quality and use a specific type of standard: Protein Standard Absolute Quantification (PSAQ™) standards. These standards are whole isotopically labelled proteins, analogues of the proteins to be assayed. PSAQ standards can be added early during sample treatment, thus they can correct for protein losses during sample prefractionation and for incomplete sample digestion. Because of this, quantification of target proteins is very accurate and precise using these standards. To illustrate the advantages of the PSAQ method, and to contribute to the increase in its use, selected applications in the biomedical field are detailed here. PMID:23019168

  2. PSI: a comprehensive and integrative approach for accurate plant subcellular localization prediction.

    PubMed

    Liu, Lili; Zhang, Zijun; Mei, Qian; Chen, Ming

    2013-01-01

    Predicting the subcellular localization of proteins conquers the major drawbacks of high-throughput localization experiments that are costly and time-consuming. However, current subcellular localization predictors are limited in scope and accuracy. In particular, most predictors perform well on certain locations or with certain data sets while poorly on others. Here, we present PSI, a novel high accuracy web server for plant subcellular localization prediction. PSI derives the wisdom of multiple specialized predictors via a joint-approach of group decision making strategy and machine learning methods to give an integrated best result. The overall accuracy obtained (up to 93.4%) was higher than best individual (CELLO) by ~10.7%. The precision of each predicable subcellular location (more than 80%) far exceeds that of the individual predictors. It can also deal with multi-localization proteins. PSI is expected to be a powerful tool in protein location engineering as well as in plant sciences, while the strategy employed could be applied to other integrative problems. A user-friendly web server, PSI, has been developed for free access at http://bis.zju.edu.cn/psi/. PMID:24194827

  3. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water.

    PubMed

    Shvab, I; Sadus, Richard J

    2013-11-21

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g∕cm(3) for a wide range of temperatures (298-650 K) and pressures (0.1-700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC∕E and TIP4P∕2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC∕E and TIP4P∕2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K. PMID:24320337

  4. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water

    NASA Astrophysics Data System (ADS)

    Shvab, I.; Sadus, Richard J.

    2013-11-01

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g/cm3 for a wide range of temperatures (298-650 K) and pressures (0.1-700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC/E and TIP4P/2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC/E and TIP4P/2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  5. Toward an Accurate Prediction of the Arrival Time of Geomagnetic-Effective Coronal Mass Ejections

    NASA Astrophysics Data System (ADS)

    Shi, T.; Wang, Y.; Wan, L.; Cheng, X.; Ding, M.; Zhang, J.

    2015-12-01

    Accurately predicting the arrival of coronal mass ejections (CMEs) to the Earth based on remote images is of critical significance for the study of space weather. Here we make a statistical study of 21 Earth-directed CMEs, specifically exploring the relationship between CME initial speeds and transit times. The initial speed of a CME is obtained by fitting the CME with the Graduated Cylindrical Shell model and is thus free of projection effects. We then use the drag force model to fit results of the transit time versus the initial speed. By adopting different drag regimes, i.e., the viscous, aerodynamics, and hybrid regimes, we get similar results, with a least mean estimation error of the hybrid model of 12.9 hr. CMEs with a propagation angle (the angle between the propagation direction and the Sun-Earth line) larger than their half-angular widths arrive at the Earth with an angular deviation caused by factors other than the radial solar wind drag. The drag force model cannot be reliably applied to such events. If we exclude these events in the sample, the prediction accuracy can be improved, i.e., the estimation error reduces to 6.8 hr. This work suggests that it is viable to predict the arrival time of CMEs to the Earth based on the initial parameters with fairly good accuracy. Thus, it provides a method of forecasting space weather 1-5 days following the occurrence of CMEs.

  6. Towards first-principles based prediction of highly accurate electrochemical Pourbiax diagrams

    NASA Astrophysics Data System (ADS)

    Zeng, Zhenhua; Chan, Maria; Greeley, Jeff

    2015-03-01

    Electrochemical Pourbaix diagrams lie at the heart of aqueous electrochemical processes and are central to the identification of stable phases of metals for processes ranging from electrocatalysis to corrosion. Even though standard DFT calculations are potentially powerful tools for the prediction of such Pourbaix diagrams, inherent errors in the description of strongly-correlated transition metal (hydr)oxides, together with neglect of weak van der Waals (vdW) interactions, has limited the reliability of the predictions for even the simplest bulk systems; corresponding predictions for more complex alloy or surface structures are even more challenging . Through introduction of a Hubbard U correction, employment of a state-of-the-art van der Waals functional, and use of pure water as a reference state for the calculations, these errors are systematically corrected. The strong performance is illustrated on a series of bulk transition metal (Mn, Fe, Co and Ni) hydroxide, oxyhydroxide, binary and ternary oxides where the corresponding thermodynamics of oxidation and reduction can be accurately described with standard errors of less than 0.04 eV in comparison with experiment.

  7. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water

    SciTech Connect

    Shvab, I.; Sadus, Richard J.

    2013-11-21

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g/cm{sup 3} for a wide range of temperatures (298–650 K) and pressures (0.1–700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC/E and TIP4P/2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC/E and TIP4P/2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  8. Accurate single-sequence prediction of solvent accessible surface area using local and global features

    PubMed Central

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-01-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  9. Accurate single-sequence prediction of solvent accessible surface area using local and global features.

    PubMed

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-11-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  10. Protein-protein interface prediction based on hexagon structure similarity.

    PubMed

    Guo, Fei; Ding, Yijie; Li, Shuai Cheng; Shen, Chao; Wang, Lusheng

    2016-08-01

    Studies on protein-protein interaction are important in proteome research. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this paper, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity. Experiments show that our method achieves better results than some state-of-the-art methods for identifying protein-protein interface. Comparing to existing methods, our approach improves F-measure value by at least 0.03. On a common dataset consisting of 41 complexes, our method has overall precision and recall values of 63% and 57%. On Benchmark v4.0, our method has overall precision and recall values of 55% and 56%. On CAPRI targets, our method has overall precision and recall values of 52% and 55%. PMID:26936323

  11. Neural network definitions of highly predictable protein secondary structure classes

    SciTech Connect

    Lapedes, A. |; Steeg, E.; Farber, R.

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  12. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  13. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  14. Direct Pressure Monitoring Accurately Predicts Pulmonary Vein Occlusion During Cryoballoon Ablation

    PubMed Central

    Kosmidou, Ioanna; Wooden, Shannnon; Jones, Brian; Deering, Thomas; Wickliffe, Andrew; Dan, Dan

    2013-01-01

    Cryoballoon ablation (CBA) is an established therapy for atrial fibrillation (AF). Pulmonary vein (PV) occlusion is essential for achieving antral contact and PV isolation and is typically assessed by contrast injection. We present a novel method of direct pressure monitoring for assessment of PV occlusion. Transcatheter pressure is monitored during balloon advancement to the PV antrum. Pressure is recorded via a single pressure transducer connected to the inner lumen of the cryoballoon. Pressure curve characteristics are used to assess occlusion in conjunction with fluoroscopic or intracardiac echocardiography (ICE) guidance. PV occlusion is confirmed when loss of typical left atrial (LA) pressure waveform is observed with recordings of PA pressure characteristics (no A wave and rapid V wave upstroke). Complete pulmonary vein occlusion as assessed with this technique has been confirmed with concurrent contrast utilization during the initial testing of the technique and has been shown to be highly accurate and readily reproducible. We evaluated the efficacy of this novel technique in 35 patients. A total of 128 veins were assessed for occlusion with the cryoballoon utilizing the pressure monitoring technique; occlusive pressure was demonstrated in 113 veins with resultant successful pulmonary vein isolation in 111 veins (98.2%). Occlusion was confirmed with subsequent contrast injection during the initial ten procedures, after which contrast utilization was rapidly reduced or eliminated given the highly accurate identification of occlusive pressure waveform with limited initial training. Verification of PV occlusive pressure during CBA is a novel approach to assessing effective PV occlusion and it accurately predicts electrical isolation. Utilization of this method results in significant decrease in fluoroscopy time and volume of contrast. PMID:23485956

  15. Direct pressure monitoring accurately predicts pulmonary vein occlusion during cryoballoon ablation.

    PubMed

    Kosmidou, Ioanna; Wooden, Shannnon; Jones, Brian; Deering, Thomas; Wickliffe, Andrew; Dan, Dan

    2013-01-01

    Cryoballoon ablation (CBA) is an established therapy for atrial fibrillation (AF). Pulmonary vein (PV) occlusion is essential for achieving antral contact and PV isolation and is typically assessed by contrast injection. We present a novel method of direct pressure monitoring for assessment of PV occlusion. Transcatheter pressure is monitored during balloon advancement to the PV antrum. Pressure is recorded via a single pressure transducer connected to the inner lumen of the cryoballoon. Pressure curve characteristics are used to assess occlusion in conjunction with fluoroscopic or intracardiac echocardiography (ICE) guidance. PV occlusion is confirmed when loss of typical left atrial (LA) pressure waveform is observed with recordings of PA pressure characteristics (no A wave and rapid V wave upstroke). Complete pulmonary vein occlusion as assessed with this technique has been confirmed with concurrent contrast utilization during the initial testing of the technique and has been shown to be highly accurate and readily reproducible. We evaluated the efficacy of this novel technique in 35 patients. A total of 128 veins were assessed for occlusion with the cryoballoon utilizing the pressure monitoring technique; occlusive pressure was demonstrated in 113 veins with resultant successful pulmonary vein isolation in 111 veins (98.2%). Occlusion was confirmed with subsequent contrast injection during the initial ten procedures, after which contrast utilization was rapidly reduced or eliminated given the highly accurate identification of occlusive pressure waveform with limited initial training. Verification of PV occlusive pressure during CBA is a novel approach to assessing effective PV occlusion and it accurately predicts electrical isolation. Utilization of this method results in significant decrease in fluoroscopy time and volume of contrast. PMID:23485956

  16. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  17. A fast and accurate method to predict 2D and 3D aerodynamic boundary layer flows

    NASA Astrophysics Data System (ADS)

    Bijleveld, H. A.; Veldman, A. E. P.

    2014-12-01

    A quasi-simultaneous interaction method is applied to predict 2D and 3D aerodynamic flows. This method is suitable for offshore wind turbine design software as it is a very accurate and computationally reasonably cheap method. This study shows the results for a NACA 0012 airfoil. The two applied solvers converge to the experimental values when the grid is refined. We also show that in separation the eigenvalues remain positive thus avoiding the Goldstein singularity at separation. In 3D we show a flow over a dent in which separation occurs. A rotating flat plat is used to show the applicability of the method for rotating flows. The shown capabilities of the method indicate that the quasi-simultaneous interaction method is suitable for design methods for offshore wind turbine blades.

  18. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models

    NASA Astrophysics Data System (ADS)

    Blackman, Jonathan; Field, Scott E.; Galley, Chad R.; Szilágyi, Béla; Scheel, Mark A.; Tiglio, Manuel; Hemberger, Daniel A.

    2015-09-01

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic -2Yℓm waveform modes resolved by the NR code up to ℓ=8 . We compare our surrogate model to effective one body waveforms from 50 M⊙ to 300 M⊙ for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases).

  19. Distance scaling method for accurate prediction of slowly varying magnetic fields in satellite missions

    NASA Astrophysics Data System (ADS)

    Zacharias, Panagiotis P.; Chatzineofytou, Elpida G.; Spantideas, Sotirios T.; Capsalis, Christos N.

    2016-07-01

    In the present work, the determination of the magnetic behavior of localized magnetic sources from near-field measurements is examined. The distance power law of the magnetic field fall-off is used in various cases to accurately predict the magnetic signature of an equipment under test (EUT) consisting of multiple alternating current (AC) magnetic sources. Therefore, parameters concerning the location of the observation points (magnetometers) are studied towards this scope. The results clearly show that these parameters are independent of the EUT's size and layout. Additionally, the techniques developed in the present study enable the placing of the magnetometers close to the EUT, thus achieving high signal-to-noise ratio (SNR). Finally, the proposed method is verified by real measurements, using a mobile phone as an EUT.

  20. PROMALS3D web server for accurate multiple protein sequence and structure alignments.

    PubMed

    Pei, Jimin; Tang, Ming; Grishin, Nick V

    2008-07-01

    Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/. PMID:18503087

  1. Measuring solar reflectance Part I: Defining a metric that accurately predicts solar heat gain

    SciTech Connect

    Levinson, Ronnen; Akbari, Hashem; Berdahl, Paul

    2010-05-14

    Solar reflectance can vary with the spectral and angular distributions of incident sunlight, which in turn depend on surface orientation, solar position and atmospheric conditions. A widely used solar reflectance metric based on the ASTM Standard E891 beam-normal solar spectral irradiance underestimates the solar heat gain of a spectrally selective 'cool colored' surface because this irradiance contains a greater fraction of near-infrared light than typically found in ordinary (unconcentrated) global sunlight. At mainland U.S. latitudes, this metric RE891BN can underestimate the annual peak solar heat gain of a typical roof or pavement (slope {le} 5:12 [23{sup o}]) by as much as 89 W m{sup -2}, and underestimate its peak surface temperature by up to 5 K. Using R{sub E891BN} to characterize roofs in a building energy simulation can exaggerate the economic value N of annual cool-roof net energy savings by as much as 23%. We define clear-sky air mass one global horizontal ('AM1GH') solar reflectance R{sub g,0}, a simple and easily measured property that more accurately predicts solar heat gain. R{sub g,0} predicts the annual peak solar heat gain of a roof or pavement to within 2 W m{sup -2}, and overestimates N by no more than 3%. R{sub g,0} is well suited to rating the solar reflectances of roofs, pavements and walls. We show in Part II that R{sub g,0} can be easily and accurately measured with a pyranometer, a solar spectrophotometer or version 6 of the Solar Spectrum Reflectometer.

  2. Measuring solar reflectance - Part I: Defining a metric that accurately predicts solar heat gain

    SciTech Connect

    Levinson, Ronnen; Akbari, Hashem; Berdahl, Paul

    2010-09-15

    Solar reflectance can vary with the spectral and angular distributions of incident sunlight, which in turn depend on surface orientation, solar position and atmospheric conditions. A widely used solar reflectance metric based on the ASTM Standard E891 beam-normal solar spectral irradiance underestimates the solar heat gain of a spectrally selective ''cool colored'' surface because this irradiance contains a greater fraction of near-infrared light than typically found in ordinary (unconcentrated) global sunlight. At mainland US latitudes, this metric R{sub E891BN} can underestimate the annual peak solar heat gain of a typical roof or pavement (slope {<=} 5:12 [23 ]) by as much as 89 W m{sup -2}, and underestimate its peak surface temperature by up to 5 K. Using R{sub E891BN} to characterize roofs in a building energy simulation can exaggerate the economic value N of annual cool roof net energy savings by as much as 23%. We define clear sky air mass one global horizontal (''AM1GH'') solar reflectance R{sub g,0}, a simple and easily measured property that more accurately predicts solar heat gain. R{sub g,0} predicts the annual peak solar heat gain of a roof or pavement to within 2 W m{sup -2}, and overestimates N by no more than 3%. R{sub g,0} is well suited to rating the solar reflectances of roofs, pavements and walls. We show in Part II that R{sub g,0} can be easily and accurately measured with a pyranometer, a solar spectrophotometer or version 6 of the Solar Spectrum Reflectometer. (author)

  3. A novel approach for accurate prediction of spontaneous passage of ureteral stones: support vector machines.

    PubMed

    Dal Moro, F; Abate, A; Lanckriet, G R G; Arandjelovic, G; Gasparella, P; Bassi, P; Mancini, M; Pagano, F

    2006-01-01

    The objective of this study was to optimally predict the spontaneous passage of ureteral stones in patients with renal colic by applying for the first time support vector machines (SVM), an instance of kernel methods, for classification. After reviewing the results found in the literature, we compared the performances obtained with logistic regression (LR) and accurately trained artificial neural networks (ANN) to those obtained with SVM, that is, the standard SVM, and the linear programming SVM (LP-SVM); the latter techniques show an improved performance. Moreover, we rank the prediction factors according to their importance using Fisher scores and the LP-SVM feature weights. A data set of 1163 patients affected by renal colic has been analyzed and restricted to single out a statistically coherent subset of 402 patients. Nine clinical factors are used as inputs for the classification algorithms, to predict one binary output. The algorithms are cross-validated by training and testing on randomly selected train- and test-set partitions of the data and reporting the average performance on the test sets. The SVM-based approaches obtained a sensitivity of 84.5% and a specificity of 86.9%. The feature ranking based on LP-SVM gives the highest importance to stone size, stone position and symptom duration before check-up. We propose a statistically correct way of employing LR, ANN and SVM for the prediction of spontaneous passage of ureteral stones in patients with renal colic. SVM outperformed ANN, as well as LR. This study will soon be translated into a practical software toolbox for actual clinical usage. PMID:16374437

  4. Predicting Physical Interactions between Protein Complexes*

    PubMed Central

    Clancy, Trevor; Rødland, Einar Andreas; Nygard, Ståle; Hovig, Eivind

    2013-01-01

    Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships. PMID:23438732

  5. Including Ligand Induced Protein Flexibility into Protein Tunnel Prediction

    PubMed Central

    Kingsley, Laura J.; Lill, Markus A.

    2014-01-01

    In proteins with buried active sites, understanding how ligands migrate through the tunnels that connect the exterior of the protein to the active site can shed light on substrate specificity and enzyme function. A growing body of evidence highlights the importance of protein flexibility in the binding site upon ligand binding; however, the influence of protein flexibility throughout the body of the protein during ligand entry and egress is much less characterized. We have developed a novel tunnel prediction and evaluation method named IterTunnel, which includes the influence of ligand-induced protein flexibility, guarantees ligand egress, and provides detailed free energy information as the ligand proceeds along the egress route. IterTunnel combines geometric tunnel prediction with steered MD in an iterative process to identify tunnels that open as a result of ligand migration and calculates the potential of mean force (PMF) of ligand egress through a given tunnel. Applying this new method to cytochrome P450 2B6 (CYP2B6), we demonstrate the influence of protein flexibility on the shape and accessibility of tunnels. More importantly, we demonstrate that the ligand itself, while traversing through a tunnel, can reshape tunnels due to its interaction with the protein. This process results in the exposure of new tunnels and the closure of pre-existing tunnels as the ligand migrates from the active site. PMID:25043499

  6. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction

    PubMed Central

    Sun, Zheng; Xiang, Jianhai

    2014-01-01

    WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1) and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP) encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA), two integrin beta (ITGB), and one syndecan (SDC). Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp. PMID:24982879

  7. Accurate protein structure modeling using sparse NMR data and homologous structure information

    PubMed Central

    Thompson, James M.; Sgourakis, Nikolaos G.; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L.; Szyperski, Thomas; Montelione, Gaetano T.; Baker, David

    2012-01-01

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining , 13C, and 15N backbone and 13Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2–1.9 Å relative to the conventional determined NMR ensembles and of 0.9–1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments. PMID:22665781

  8. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina

    PubMed Central

    Maturana, Matias I.; Apollo, Nicholas V.; Hadjinicolaou, Alex E.; Garrett, David J.; Cloherty, Shaun L.; Kameneva, Tatiana; Grayden, David B.; Ibbotson, Michael R.; Meffin, Hamish

    2016-01-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron’s electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy. PMID:27035143

  9. Can CO2 assimilation in maize leaves be predicted accurately from chlorophyll fluorescence analysis?

    PubMed

    Edwards, G E; Baker, N R

    1993-08-01

    Analysis is made of the energetics of CO2 fixation, the photochemical quantum requirement per CO2 fixed, and sinks for utilising reductive power in the C4 plant maize. CO2 assimilation is the primary sink for energy derived from photochemistry, whereas photorespiration and nitrogen assimilation are relatively small sinks, particularly in developed leaves. Measurement of O2 exchange by mass spectrometry and CO2 exchange by infrared gas analysis under varying levels of CO2 indicate that there is a very close relationship between the true rate of O2 evolution from PS II and the net rate of CO2 fixation. Consideration is given to measurements of the quantum yields of PS II (φ PS II) from fluorescence analysis and of CO2 assimilation ([Formula: see text]) in maize over a wide range of conditions. The[Formula: see text] ratio was found to remain reasonably constant (ca. 12) over a range of physiological conditions in developed leaves, with varying temperature, CO2 concentrations, light intensities (from 5% to 100% of full sunlight), and following photoinhibition under high light and low temperature. A simple model for predicting CO2 assimilation from fluorescence parameters is presented and evaluated. It is concluded that under a wide range of conditions fluorescence parameters can be used to predict accurately and rapidly CO2 assimilation rates in maize. PMID:24317706

  10. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina.

    PubMed

    Maturana, Matias I; Apollo, Nicholas V; Hadjinicolaou, Alex E; Garrett, David J; Cloherty, Shaun L; Kameneva, Tatiana; Grayden, David B; Ibbotson, Michael R; Meffin, Hamish

    2016-04-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron's electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy. PMID:27035143

  11. Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment☆

    PubMed Central

    Young, Jonathan; Modat, Marc; Cardoso, Manuel J.; Mendelson, Alex; Cash, Dave; Ourselin, Sebastien

    2013-01-01

    Accurately identifying the patients that have mild cognitive impairment (MCI) who will go on to develop Alzheimer's disease (AD) will become essential as new treatments will require identification of AD patients at earlier stages in the disease process. Most previous work in this area has centred around the same automated techniques used to diagnose AD patients from healthy controls, by coupling high dimensional brain image data or other relevant biomarker data to modern machine learning techniques. Such studies can now distinguish between AD patients and controls as accurately as an experienced clinician. Models trained on patients with AD and control subjects can also distinguish between MCI patients that will convert to AD within a given timeframe (MCI-c) and those that remain stable (MCI-s), although differences between these groups are smaller and thus, the corresponding accuracy is lower. The most common type of classifier used in these studies is the support vector machine, which gives categorical class decisions. In this paper, we introduce Gaussian process (GP) classification to the problem. This fully Bayesian method produces naturally probabilistic predictions, which we show correlate well with the actual chances of converting to AD within 3 years in a population of 96 MCI-s and 47 MCI-c subjects. Furthermore, we show that GPs can integrate multimodal data (in this study volumetric MRI, FDG-PET, cerebrospinal fluid, and APOE genotype with the classification process through the use of a mixed kernel). The GP approach aids combination of different data sources by learning parameters automatically from training data via type-II maximum likelihood, which we compare to a more conventional method based on cross validation and an SVM classifier. When the resulting probabilities from the GP are dichotomised to produce a binary classification, the results for predicting MCI conversion based on the combination of all three types of data show a balanced accuracy

  12. NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells.

    PubMed

    Dixon, Andrew S; Schwinn, Marie K; Hall, Mary P; Zimmerman, Kris; Otto, Paul; Lubben, Thomas H; Butler, Braeden L; Binkowski, Brock F; Machleidt, Thomas; Kirkland, Thomas A; Wood, Monika G; Eggers, Christopher T; Encell, Lance P; Wood, Keith V

    2016-02-19

    Protein-fragment complementation assays (PCAs) are widely used for investigating protein interactions. However, the fragments used are structurally compromised and have not been optimized nor thoroughly characterized for accurately assessing these interactions. We took advantage of the small size and bright luminescence of NanoLuc to engineer a new complementation reporter (NanoBiT). By design, the NanoBiT subunits (i.e., 1.3 kDa peptide, 18 kDa polypeptide) weakly associate so that their assembly into a luminescent complex is dictated by the interaction characteristics of the target proteins onto which they are appended. To ascertain their general suitability for measuring interaction affinities and kinetics, we determined that their intrinsic affinity (KD = 190 μM) and association constants (kon = 500 M(-1) s(-1), koff = 0.2 s(-1)) are outside of the ranges typical for protein interactions. The accuracy of NanoBiT was verified under defined biochemical conditions using the previously characterized interaction between SME-1 β-lactamase and a set of inhibitor binding proteins. In cells, NanoBiT fusions to FRB/FKBP produced luminescence consistent with the linear characteristics of NanoLuc. Response dynamics, evaluated using both protein kinase A and β-arrestin-2, were rapid, reversible, and robust to temperature (21-37 °C). Finally, NanoBiT provided a means to measure pharmacology of kinase inhibitors known to induce the interaction between BRAF and CRAF. Our results demonstrate that the intrinsic properties of NanoBiT allow accurate representation of protein interactions and that the reporter responds reliably and dynamically in cells. PMID:26569370

  13. Accurate First-Principles Spectra Predictions for Planetological and Astrophysical Applications at Various T-Conditions

    NASA Astrophysics Data System (ADS)

    Rey, M.; Nikitin, A. V.; Tyuterev, V.

    2014-06-01

    Knowledge of near infrared intensities of rovibrational transitions of polyatomic molecules is essential for the modeling of various planetary atmospheres, brown dwarfs and for other astrophysical applications 1,2,3. For example, to analyze exoplanets, atmospheric models have been developed, thus making the need to provide accurate spectroscopic data. Consequently, the spectral characterization of such planetary objects relies on the necessity of having adequate and reliable molecular data in extreme conditions (temperature, optical path length, pressure). On the other hand, in the modeling of astrophysical opacities, millions of lines are generally involved and the line-by-line extraction is clearly not feasible in laboratory measurements. It is thus suggested that this large amount of data could be interpreted only by reliable theoretical predictions. There exists essentially two theoretical approaches for the computation and prediction of spectra. The first one is based on empirically-fitted effective spectroscopic models. Another way for computing energies, line positions and intensities is based on global variational calculations using ab initio surfaces. They do not yet reach the spectroscopic accuracy stricto sensu but implicitly account for all intramolecular interactions including resonance couplings in a wide spectral range. The final aim of this work is to provide reliable predictions which could be quantitatively accurate with respect to the precision of available observations and as complete as possible. All this thus requires extensive first-principles quantum mechanical calculations essentially based on three necessary ingredients which are (i) accurate intramolecular potential energy surface and dipole moment surface components well-defined in a large range of vibrational displacements and (ii) efficient computational methods combined with suitable choices of coordinates to account for molecular symmetry properties and to achieve a good numerical

  14. Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning

    PubMed Central

    Abraham, Gad; Tye-Din, Jason A.; Bhalala, Oneil G.; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2014-01-01

    Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases. PMID:24550740

  15. Bifunctional Spin Labeling of Muscle Proteins: Accurate Rotational Dynamics, Orientation, and Distance by EPR.

    PubMed

    Thompson, Andrew R; Binder, Benjamin P; McCaffrey, Jesse E; Svensson, Bengt; Thomas, David D

    2015-01-01

    While EPR allows for the characterization of protein structure and function due to its exquisite sensitivity to spin label dynamics, orientation, and distance, these measurements are often limited in sensitivity due to the use of labels that are attached via flexible monofunctional bonds, incurring additional disorder and nanosecond dynamics. In this chapter, we present methods for using a bifunctional spin label (BSL) to measure muscle protein structure and dynamics. We demonstrate that bifunctional attachment eliminates nanosecond internal rotation of the spin label, thereby allowing the accurate measurement of protein backbone rotational dynamics, including microsecond-to-millisecond motions by saturation transfer EPR. BSL also allows for accurate determination of helix orientation and disorder in mechanically and magnetically aligned systems, due to the label's stereospecific attachment. Similarly, labeling with a pair of BSL greatly enhances the resolution and accuracy of distance measurements measured by double electron-electron resonance (DEER). Finally, when BSL is applied to a protein with high helical content in an assembly with high orientational order (e.g., muscle fiber or membrane), two-probe DEER experiments can be combined with single-probe EPR experiments on an oriented sample in a process we call BEER, which has the potential for ab initio high-resolution structure determination. PMID:26477249

  16. Energy expenditure during level human walking: seeking a simple and accurate predictive solution.

    PubMed

    Ludlow, Lindsay W; Weyand, Peter G

    2016-03-01

    Accurate prediction of the metabolic energy that walking requires can inform numerous health, bodily status, and fitness outcomes. We adopted a two-step approach to identifying a concise, generalized equation for predicting level human walking metabolism. Using literature-aggregated values we compared 1) the predictive accuracy of three literature equations: American College of Sports Medicine (ACSM), Pandolf et al., and Height-Weight-Speed (HWS); and 2) the goodness-of-fit possible from one- vs. two-component descriptions of walking metabolism. Literature metabolic rate values (n = 127; speed range = 0.4 to 1.9 m/s) were aggregated from 25 subject populations (n = 5-42) whose means spanned a 1.8-fold range of heights and a 4.2-fold range of weights. Population-specific resting metabolic rates (V̇o2 rest) were determined using standardized equations. Our first finding was that the ACSM and Pandolf et al. equations underpredicted nearly all 127 literature-aggregated values. Consequently, their standard errors of estimate (SEE) were nearly four times greater than those of the HWS equation (4.51 and 4.39 vs. 1.13 ml O2·kg(-1)·min(-1), respectively). For our second comparison, empirical best-fit relationships for walking metabolism were derived from the data set in one- and two-component forms for three V̇o2-speed model types: linear (∝V(1.0)), exponential (∝V(2.0)), and exponential/height (∝V(2.0)/Ht). We found that the proportion of variance (R(2)) accounted for, when averaged across the three model types, was substantially lower for one- vs. two-component versions (0.63 ± 0.1 vs. 0.90 ± 0.03) and the predictive errors were nearly twice as great (SEE = 2.22 vs. 1.21 ml O2·kg(-1)·min(-1)). Our final analysis identified the following concise, generalized equation for predicting level human walking metabolism: V̇o2 total = V̇o2 rest + 3.85 + 5.97·V(2)/Ht (where V is measured in m/s, Ht in meters, and V̇o2 in ml O2·kg(-1)·min(-1)). PMID:26679617

  17. Effects of Protein Conformation in Docking: Improved Pose Prediction through Protein Pocket Adaptation

    PubMed Central

    Jain, Ajay N.

    2009-01-01

    Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the “cognate” docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the “cross-docking case” (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate. PMID:19340588

  18. Protein-protein interactions and prediction: a comprehensive overview.

    PubMed

    Sowmya, Gopichandran; Ranganathan, Shoba

    2014-01-01

    Molecular function in cellular processes is governed by protein-protein interactions (PPIs) within biological networks. Selective yet specific association of these protein partners contributes to diverse functionality such as catalysis, regulation, assembly, immunity, and inhibition in a cell. Therefore, understanding the principles of protein-protein association has been of immense interest for several decades. We provide an overview of the experimental methods used to determine PPIs and the key databases archiving this information. Structural and functional information of existing protein complexes confers knowledge on the principles of PPI, based on which a classification scheme for PPIs is then introduced. Obtaining high-quality non-redundant datasets of protein complexes for interaction characterisation is an essential step towards deciphering their underlying binding principles. Analysis of physicochemical features and their documentation has enhanced our understanding of the molecular basis of protein-protein association. We describe the diverse datasets created/collected by various groups and their key findings inferring distinguishing features. The currently available interface databases and prediction servers have also been compiled. PMID:23855658

  19. TIMP2•IGFBP7 biomarker panel accurately predicts acute kidney injury in high-risk surgical patients

    PubMed Central

    Gunnerson, Kyle J.; Shaw, Andrew D.; Chawla, Lakhmir S.; Bihorac, Azra; Al-Khafaji, Ali; Kashani, Kianoush; Lissauer, Matthew; Shi, Jing; Walker, Michael G.; Kellum, John A.

    2016-01-01

    BACKGROUND Acute kidney injury (AKI) is an important complication in surgical patients. Existing biomarkers and clinical prediction models underestimate the risk for developing AKI. We recently reported data from two trials of 728 and 408 critically ill adult patients in whom urinary TIMP2•IGFBP7 (NephroCheck, Astute Medical) was used to identify patients at risk of developing AKI. Here we report a preplanned analysis of surgical patients from both trials to assess whether urinary tissue inhibitor of metalloproteinase 2 (TIMP-2) and insulin-like growth factor–binding protein 7 (IGFBP7) accurately identify surgical patients at risk of developing AKI. STUDY DESIGN We enrolled adult surgical patients at risk for AKI who were admitted to one of 39 intensive care units across Europe and North America. The primary end point was moderate-severe AKI (equivalent to KDIGO [Kidney Disease Improving Global Outcomes] stages 2–3) within 12 hours of enrollment. Biomarker performance was assessed using the area under the receiver operating characteristic curve, integrated discrimination improvement, and category-free net reclassification improvement. RESULTS A total of 375 patients were included in the final analysis of whom 35 (9%) developed moderate-severe AKI within 12 hours. The area under the receiver operating characteristic curve for [TIMP-2]•[IGFBP7] alone was 0.84 (95% confidence interval, 0.76–0.90; p < 0.0001). Biomarker performance was robust in sensitivity analysis across predefined subgroups (urgency and type of surgery). CONCLUSION For postoperative surgical intensive care unit patients, a single urinary TIMP2•IGFBP7 test accurately identified patients at risk for developing AKI within the ensuing 12 hours and its inclusion in clinical risk prediction models significantly enhances their performance. LEVEL OF EVIDENCE Prognostic study, level I. PMID:26816218

  20. Can radiation therapy treatment planning system accurately predict surface doses in postmastectomy radiation therapy patients?

    SciTech Connect

    Wong, Sharon; Back, Michael; Tan, Poh Wee; Lee, Khai Mun; Baggarley, Shaun; Lu, Jaide Jay

    2012-07-01

    Skin doses have been an important factor in the dose prescription for breast radiotherapy. Recent advances in radiotherapy treatment techniques, such as intensity-modulated radiation therapy (IMRT) and new treatment schemes such as hypofractionated breast therapy have made the precise determination of the surface dose necessary. Detailed information of the dose at various depths of the skin is also critical in designing new treatment strategies. The purpose of this work was to assess the accuracy of surface dose calculation by a clinically used treatment planning system and those measured by thermoluminescence dosimeters (TLDs) in a customized chest wall phantom. This study involved the construction of a chest wall phantom for skin dose assessment. Seven TLDs were distributed throughout each right chest wall phantom to give adequate representation of measured radiation doses. Point doses from the CMS Xio Registered-Sign treatment planning system (TPS) were calculated for each relevant TLD positions and results correlated. There were no significant difference between measured absorbed dose by TLD and calculated doses by the TPS (p > 0.05 (1-tailed). Dose accuracy of up to 2.21% was found. The deviations from the calculated absorbed doses were overall larger (3.4%) when wedges and bolus were used. 3D radiotherapy TPS is a useful and accurate tool to assess the accuracy of surface dose. Our studies have shown that radiation treatment accuracy expressed as a comparison between calculated doses (by TPS) and measured doses (by TLD dosimetry) can be accurately predicted for tangential treatment of the chest wall after mastectomy.

  1. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression.

    PubMed

    Huang, De-Shuang; Zhang, Lei; Han, Kyungsook; Deng, Suping; Yang, Kai; Zhang, Hongbo

    2014-01-01

    In order to transform protein sequences into the feature vectors, several works have been done, such as computing auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins, respectively, where AC, CT, LD, MA and NMB are all represented by '+' in a unified manner. A new method, i.e. the combination of least squares regression with '+' (abbreviated as LSR(+)), will be introduced for encoding a protein-protein correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations for LSR(+), i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine (SVM) approach with LSR(+) to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans. The experimental results demonstrate that all LSR(+) methods outperform many existing representative algorithms. Therefore, LSR(+) is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks. PMID:25059329

  2. Continuum descriptions of membranes and their interaction with proteins: Towards chemically accurate models.

    PubMed

    Argudo, David; Bethel, Neville P; Marcoline, Frank V; Grabe, Michael

    2016-07-01

    Biological membranes deform in response to resident proteins leading to a coupling between membrane shape and protein localization. Additionally, the membrane influences the function of membrane proteins. Here we review contributions to this field from continuum elastic membrane models focusing on the class of models that couple the protein to the membrane. While it has been argued that continuum models cannot reproduce the distortions observed in fully-atomistic molecular dynamics simulations, we suggest that this failure can be overcome by using chemically accurate representations of the protein. We outline our recent advances along these lines with our hybrid continuum-atomistic model, and we show the model is in excellent agreement with fully-atomistic simulations of the nhTMEM16 lipid scramblase. We believe that the speed and accuracy of continuum-atomistic methodologies will make it possible to simulate large scale, slow biological processes, such as membrane morphological changes, that are currently beyond the scope of other computational approaches. This article is part of a Special Issue entitled: Membrane Proteins edited by J.C. Gumbart and Sergei Noskov. PMID:26853937

  3. Predicting protein-ligand and protein-peptide interfaces

    NASA Astrophysics Data System (ADS)

    Bertolazzi, Paola; Guerra, Concettina; Liuzzi, Giampaolo

    2014-06-01

    The paper deals with the identification of binding sites and concentrates on interactions involving small interfaces. In particular we focus our attention on two major interface types, namely protein-ligand and protein-peptide interfaces. As concerns protein-ligand binding site prediction, we classify the most interesting methods and approaches into four main categories: (a) shape-based methods, (b) alignment-based methods, (c) graph-theoretic approaches and (d) machine learning methods. Class (a) encompasses those methods which employ, in some way, geometric information about the protein surface. Methods falling into class (b) address the prediction problem as an alignment problem, i.e. finding protein-ligand atom pairs that occupy spatially equivalent positions. Graph theoretic approaches, class (c), are mainly based on the definition of a particular graph, known as the protein contact graph, and then apply some sophisticated methods from graph theory to discover subgraphs or score similarities for uncovering functional sites. The last class (d) contains those methods that are based on the learn-from-examples paradigm and that are able to take advantage of the large amount of data available on known protein-ligand pairs. As for protein-peptide interfaces, due to the often disordered nature of the regions involved in binding, shape similarity is no longer a determining factor. Then, in geometry-based methods, geometry is accounted for by providing the relative position of the atoms surrounding the peptide residues in known structures. Finally, also for protein-peptide interfaces, we present a classification of some successful machine learning methods. Indeed, they can be categorized in the way adopted to construct the learning examples. In particular, we envisage three main methods: distance functions, structure and potentials and structure alignment.

  4. Predictive and comparative analysis of Ebolavirus proteins.

    PubMed

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395

  5. Predicting Resistance Mutations Using Protein Design Algorithms

    SciTech Connect

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  6. Sequence features accurately predict genome-wide MeCP2 binding in vivo.

    PubMed

    Rube, H Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H; Hess, John F; LaSalle, Janine M; Song, Jun S; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  7. Sequence features accurately predict genome-wide MeCP2 binding in vivo

    PubMed Central

    Rube, H. Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H.; Hess, John F.; LaSalle, Janine M.; Song, Jun S.; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  8. Protein Structure Prediction with Evolutionary Algorithms

    SciTech Connect

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  9. New consensus definition for acute kidney injury accurately predicts 30-day mortality in cirrhosis with infection

    PubMed Central

    Wong, Florence; O’Leary, Jacqueline G; Reddy, K Rajender; Patton, Heather; Kamath, Patrick S; Fallon, Michael B; Garcia-Tsao, Guadalupe; Subramanian, Ram M.; Malik, Raza; Maliakkal, Benedict; Thacker, Leroy R; Bajaj, Jasmohan S

    2015-01-01

    Background & Aims A consensus conference proposed that cirrhosis-associated acute kidney injury (AKI) be defined as an increase in serum creatinine by >50% from the stable baseline value in <6 months or by ≥0.3mg/dL in <48 hrs. We prospectively evaluated the ability of these criteria to predict mortality within 30 days among hospitalized patients with cirrhosis and infection. Methods 337 patients with cirrhosis admitted with or developed an infection in hospital (56% men; 56±10 y old; model for end-stage liver disease score, 20±8) were followed. We compared data on 30-day mortality, hospital length-of-stay, and organ failure between patients with and without AKI. Results 166 (49%) developed AKI during hospitalization, based on the consensus criteria. Patients who developed AKI had higher admission Child-Pugh (11.0±2.1 vs 9.6±2.1; P<.0001), and MELD scores (23±8 vs17±7; P<.0001), and lower mean arterial pressure (81±16mmHg vs 85±15mmHg; P<.01) than those who did not. Also higher amongst patients with AKI were mortality in ≤30 days (34% vs 7%), intensive care unit transfer (46% vs 20%), ventilation requirement (27% vs 6%), and shock (31% vs 8%); AKI patients also had longer hospital stays (17.8±19.8 days vs 13.3±31.8 days) (all P<.001). 56% of AKI episodes were transient, 28% persistent, and 16% resulted in dialysis. Mortality was 80% among those without renal recovery, higher compared to partial (40%) or complete recovery (15%), or AKI-free patients (7%; P<.0001). Conclusions 30-day mortality is 10-fold higher among infected hospitalized cirrhotic patients with irreversible AKI than those without AKI. The consensus definition of AKI accurately predicts 30-day mortality, length of hospital stay, and organ failure. PMID:23999172

  10. Predicting Disease-Related Proteins Based on Clique Backbone in Protein-Protein Interaction Network

    PubMed Central

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases. PMID:25013377

  11. Scoring functions for prediction of protein-ligand interactions.

    PubMed

    Wang, Jui-Chih; Lin, Jung-Hsin

    2013-01-01

    The scoring functions for protein-ligand interactions plays central roles in computational drug design, virtual screening of chemical libraries for new lead identification, and prediction of possible binding targets of small chemical molecules. An ideal scoring function for protein-ligand interactions is expected to be able to recognize the native binding pose of a ligand on the protein surface among decoy poses, and to accurately predict the binding affinity (or binding free energy) so that the active molecules can be discriminated from the non-active ones. Due to the empirical nature of most, if not all, scoring functions for protein-ligand interactions, the general applicability of empirical scoring functions, especially to domains far outside training sets, is a major concern. In this review article, we will explore the foundations of different classes of scoring functions, their possible limitations, and their suitable application domains. We also provide assessments of several scoring functions on weakly-interacting protein-ligand complexes, which will be useful information in computational fragment-based drug design or virtual screening. PMID:23016847

  12. PRISM: a web server and repository for prediction of protein-protein interactions and modeling their 3D complexes.

    PubMed

    Baspinar, Alper; Cukuroglu, Engin; Nussinov, Ruth; Keskin, Ozlem; Gursoy, Attila

    2014-07-01

    The PRISM web server enables fast and accurate prediction of protein-protein interactions (PPIs). The prediction algorithm is knowledge-based. It combines structural similarity and accounts for evolutionary conservation in the template interfaces. The predicted models are stored in its repository. Given two protein structures, PRISM will provide a structural model of their complex if a matching template interface is available. Users can download the complex structure, retrieve the interface residues and visualize the complex model. The PRISM web server is user friendly, free and open to all users at http://cosbi.ku.edu.tr/prism. PMID:24829450

  13. A high order accurate finite element algorithm for high Reynolds number flow prediction

    NASA Technical Reports Server (NTRS)

    Baker, A. J.

    1978-01-01

    A Galerkin-weighted residuals formulation is employed to establish an implicit finite element solution algorithm for generally nonlinear initial-boundary value problems. Solution accuracy, and convergence rate with discretization refinement, are quantized in several error norms, by a systematic study of numerical solutions to several nonlinear parabolic and a hyperbolic partial differential equation characteristic of the equations governing fluid flows. Solutions are generated using selective linear, quadratic and cubic basis functions. Richardson extrapolation is employed to generate a higher-order accurate solution to facilitate isolation of truncation error in all norms. Extension of the mathematical theory underlying accuracy and convergence concepts for linear elliptic equations is predicted for equations characteristic of laminar and turbulent fluid flows at nonmodest Reynolds number. The nondiagonal initial-value matrix structure introduced by the finite element theory is determined intrinsic to improved solution accuracy and convergence. A factored Jacobian iteration algorithm is derived and evaluated to yield a consequential reduction in both computer storage and execution CPU requirements while retaining solution accuracy.

  14. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models.

    PubMed

    Blackman, Jonathan; Field, Scott E; Galley, Chad R; Szilágyi, Béla; Scheel, Mark A; Tiglio, Manuel; Hemberger, Daniel A

    2015-09-18

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic _{-2}Y_{ℓm} waveform modes resolved by the NR code up to ℓ=8. We compare our surrogate model to effective one body waveforms from 50M_{⊙} to 300M_{⊙} for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases). PMID:26430979

  15. How accurately can we predict the melting points of drug-like compounds?

    PubMed

    Tetko, Igor V; Sushko, Yurii; Novotarskyi, Sergii; Patiny, Luc; Kondratov, Ivan; Petrenko, Alexander E; Charochkina, Larisa; Asiri, Abdullah M

    2014-12-22

    This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638. PMID:25489863

  16. How Accurately Can We Predict the Melting Points of Drug-like Compounds?

    PubMed Central

    2014-01-01

    This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638. PMID:25489863

  17. Algorithmic approaches to protein-protein interaction site prediction.

    PubMed

    Aumentado-Armstrong, Tristan T; Istrate, Bogdan; Murgita, Robert A

    2015-01-01

    Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented. PMID:25713596

  18. CASP11 - An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline.

    PubMed

    Fischer, Axel W; Heinze, Sten; Putnam, Daniel K; Li, Bian; Pino, James C; Xia, Yan; Lopez, Carlos F; Meiler, Jens

    2016-01-01

    In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data. PMID:27046050

  19. Computational prediction of human salivary proteins from blood circulation and application to diagnostic biomarker identification.

    PubMed

    Wang, Jiaxin; Liang, Yanchun; Wang, Yan; Cui, Juan; Liu, Ming; Du, Wei; Xu, Ying

    2013-01-01

    Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer. PMID:24324552

  20. Accurate prediction of V1 location from cortical folds in a surface coordinate system

    PubMed Central

    Hinds, Oliver P.; Rajendran, Niranjini; Polimeni, Jonathan R.; Augustinack, Jean C.; Wiggins, Graham; Wald, Lawrence L.; Rosas, H. Diana; Potthast, Andreas; Schwartz, Eric L.; Fischl, Bruce

    2008-01-01

    Previous studies demonstrated substantial variability of the location of primary visual cortex (V1) in stereotaxic coordinates when linear volume-based registration is used to match volumetric image intensities (Amunts et al., 2000). However, other qualitative reports of V1 location (Smith, 1904; Stensaas et al., 1974; Rademacher et al., 1993) suggested a consistent relationship between V1 and the surrounding cortical folds. Here, the relationship between folds and the location of V1 is quantified using surface-based analysis to generate a probabilistic atlas of human V1. High-resolution (about 200 μm) magnetic resonance imaging (MRI) at 7 T of ex vivo human cerebral hemispheres allowed identification of the full area via the stria of Gennari: a myeloarchitectonic feature specific to V1. Separate, whole-brain scans were acquired using MRI at 1.5 T to allow segmentation and mesh reconstruction of the cortical gray matter. For each individual, V1 was manually identified in the high-resolution volume and projected onto the cortical surface. Surface-based intersubject registration (Fischl et al., 1999b) was performed to align the primary cortical folds of individual hemispheres to those of a reference template representing the average folding pattern. An atlas of V1 location was constructed by computing the probability of V1 inclusion for each cortical location in the template space. This probabilistic atlas of V1 exhibits low prediction error compared to previous V1 probabilistic atlases built in volumetric coordinates. The increased predictability observed under surface-based registration suggests that the location of V1 is more accurately predicted by the cortical folds than by the shape of the brain embedded in the volume of the skull. In addition, the high quality of this atlas provides direct evidence that surface-based intersubject registration methods are superior to volume-based methods at superimposing functional areas of cortex, and therefore are better

  1. Protein Markers Predict Survival in Glioma Patients.

    PubMed

    Stetson, Lindsay C; Dazard, Jean-Eudes; Barnholtz-Sloan, Jill S

    2016-07-01

    Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12-14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients, we have identified a set of 13 proteins with prognostic significance. Our protein signature predictive of glioblastoma (PROTGLIO) patient survival model was constructed and validated on independent data sets and was shown to significantly predict survival in GBM patients (log-rank test: p = 0.0009). Using a multivariate Cox proportional hazards, we have shown that our PROTGLIO model is distinct from other known GBM prognostic factors (age at diagnosis, extent of surgical resection, postoperative Karnofsky performance score (KPS), treatment with temozolomide (TMZ) chemoradiation, and methylation of the MGMT gene). Tenfold cross-validation repetition of our model generation procedure confirmed validation of PROTGLIO. The model was further validated on an independent set of isocitrate dehydrogenase wild-type (IDHwt) lower grade gliomas (LGG)-a portion of these tumors progress rapidly to GBM. The PROTGLIO model contains proteins, such as Cox-2 and Annexin 1, involved in inflammatory response, pointing to potential therapeutic interventions. The PROTGLIO model is a simple and effective predictor of overall survival in glioblastoma patients, making it potentially useful in clinical practice of glioblastoma multiforme. PMID:27143410

  2. Use B-factor related features for accurate classification between protein binding interfaces and crystal packing contacts

    PubMed Central

    2014-01-01

    Background Distinction between true protein interactions and crystal packing contacts is important for structural bioinformatics studies to respond to the need of accurate classification of the rapidly increasing protein structures. There are many unannotated crystal contacts and there also exist false annotations in this rapidly expanding volume of data. Previous tools have been proposed to address this problem. However, challenging issues still remain, such as low performance when the training and test data contain mixed interfaces having diverse sizes of contact areas. Methods and results B factor is a measure to quantify the vibrational motion of an atom, a more relevant feature than interface size to characterize protein binding. We propose to use three features related to B factor for the classification between biological interfaces and crystal packing contacts. The first feature is the sum of the normalized B factors of the interfacial atoms in the contact area, the second is the average of the interfacial B factor per residue in the chain, and the third is the average number of interfacial atoms with a negative normalized B factor per residue in the chain. We investigate the distribution properties of these basic features and a compound feature on four datasets of biological binding and crystal packing, and on a protein binding-only dataset with known binding affinity. We also compare the cross-dataset classification performance of these features with existing methods and with a widely-used and the most effective feature interface area. The results demonstrate that our features outperform the interface area approach and the existing prediction methods remarkably for many tests on all of these datasets. Conclusions The proposed B factor related features are more effective than interface area to distinguish crystal packing from biological binding interfaces. Our computational methods have a potential for large-scale and accurate identification of biological

  3. CASP11 – An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline

    PubMed Central

    Fischer, Axel W.; Heinze, Sten; Putnam, Daniel K.; Li, Bian; Pino, James C.; Xia, Yan; Lopez, Carlos F.; Meiler, Jens

    2016-01-01

    In silico prediction of a protein’s tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three ‘assisted’ protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data. PMID:27046050

  4. Accurate Ab Initio and Template-Based Prediction of Short Intrinsically-Disordered Regions by Bidirectional Recurrent Neural Networks Trained on Large-Scale Datasets

    PubMed Central

    Volpato, Viola; Alshomrani, Badr; Pollastri, Gianluca

    2015-01-01

    Intrinsically-disordered regions lack a well-defined 3D structure, but play key roles in determining the function of many proteins. Although predictors of disorder have been shown to achieve relatively high rates of correct classification of these segments, improvements over the the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally-determined protein sequences to try to boost predictive performances. In this paper, we propose a predictor for short disordered regions based on bidirectional recurrent neural networks and tested by rigorous five-fold cross-validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility and direct disorder annotations from homologous protein structures (templates) deposited in the Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positive rates, making our systems a robust and efficient way to address high-throughput disorder prediction. PMID:26307973

  5. A Method for Predicting Protein-Protein Interaction Types

    PubMed Central

    Silberberg, Yael

    2014-01-01

    Protein-protein interactions (PPIs) govern basic cellular processes through signal transduction and complex formation. The diversity of those processes gives rise to a remarkable diversity of interactions types, ranging from transient phosphorylation interactions to stable covalent bonding. Despite our increasing knowledge on PPIs in humans and other species, their types remain relatively unexplored and few annotations of types exist in public databases. Here, we propose the first method for systematic prediction of PPI type based solely on the techniques by which the interaction was detected. We show that different detection methods are better suited for detecting specific types. We apply our method to ten interaction types on a large scale human PPI dataset. We evaluate the performance of the method using both internal cross validation and external data sources. In cross validation, we obtain an area under receiver operating characteristic (ROC) curve ranging from 0.65 to 0.97 with an average of 0.84 across the predicted types. Comparing the predicted interaction types to external data sources, we obtained significant agreements for phosphorylation and ubiquitination interactions, with hypergeometric p-value = 2.3e−54 and 5.6e−28 respectively. We examine the biological relevance of our predictions using known signaling pathways and chart the abundance of interaction types in cell processes. Finally, we investigate the cross-relations between different interaction types within the network and characterize the discovered patterns, or motifs. We expect the resulting annotated network to facilitate the reconstruction of process-specific subnetworks and assist in predicting protein function or interaction. PMID:24625764

  6. High IFIT1 expression predicts improved clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma.

    PubMed

    Zhang, Jin-Feng; Chen, Yao; Lin, Guo-Shi; Zhang, Jian-Dong; Tang, Wen-Long; Huang, Jian-Huang; Chen, Jin-Shou; Wang, Xing-Fu; Lin, Zhi-Xiong

    2016-06-01

    Interferon-induced protein with tetratricopeptide repeat 1 (IFIT1) plays a key role in growth suppression and apoptosis promotion in cancer cells. Interferon was reported to induce the expression of IFIT1 and inhibit the expression of O-6-methylguanine-DNA methyltransferase (MGMT).This study aimed to investigate the expression of IFIT1, the correlation between IFIT1 and MGMT, and their impact on the clinical outcome in newly diagnosed glioblastoma. The expression of IFIT1 and MGMT and their correlation were investigated in the tumor tissues from 70 patients with newly diagnosed glioblastoma. The effects on progression-free survival and overall survival were evaluated. Of 70 cases, 57 (81.4%) tissue samples showed high expression of IFIT1 by immunostaining. The χ(2) test indicated that the expression of IFIT1 and MGMT was negatively correlated (r = -0.288, P = .016). Univariate and multivariate analyses confirmed high IFIT1 expression as a favorable prognostic indicator for progression-free survival (P = .005 and .017) and overall survival (P = .001 and .001), respectively. Patients with 2 favorable factors (high IFIT1 and low MGMT) had an improved prognosis as compared with others. The results demonstrated significantly increased expression of IFIT1 in newly diagnosed glioblastoma tissue. The negative correlation between IFIT1 and MGMT expression may be triggered by interferon. High IFIT1 can be a predictive biomarker of favorable clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma. PMID:26980050

  7. PQuad: Visualization of Predicted Peptides and Proteins

    SciTech Connect

    Havre, Susan L.; Singhal, Mudita; Payne, Deborah A.; Webb-Robertson, Bobbie-Jo M.

    2004-10-10

    New high-throughput proteomic techniques generate data faster than biologist and bioinformaticists can analyze it. Yet, hidden within this massive and complex data are answers to basic questions about how cells function to support life or respond to disease. Now biologists can take a global or systems approach studying not one or two proteins at a time but whole proteomes comprising all the proteins in a cell. However, the tremendous size and complexity of the high-throughput experiment data make it difficult to process and interpret. Visualization provides powerful analysis capabilities for such enormous and complex data. In this paper, we introduce a novel interactive visualization, PQuad (Peptide Permutation and Protein Prediction), designed for the visual analysis of peptides (protein fragments) identified from high-throughput data. PQuad depicts the experiment peptides in the context of their parent protein and DNA, thereby integrating proteomic and genomic information. A wrapped line metaphor is applied across key resolutions of the data, from a compressed view of an entire chromosome to the actual nucleotide sequence. PQuad provides a difference visualization for comparing peptides from different experimental conditions. We describe the requirements for such a visual analysis tool, the design decisions, and the novel aspects of PQuad.

  8. Unilateral Prostate Cancer Cannot be Accurately Predicted in Low-Risk Patients

    SciTech Connect

    Isbarn, Hendrik; Karakiewicz, Pierre I.; Vogel, Susanne

    2010-07-01

    Purpose: Hemiablative therapy (HAT) is increasing in popularity for treatment of patients with low-risk prostate cancer (PCa). The validity of this therapeutic modality, which exclusively treats PCa within a single prostate lobe, rests on accurate staging. We tested the accuracy of unilaterally unremarkable biopsy findings in cases of low-risk PCa patients who are potential candidates for HAT. Methods and Materials: The study population consisted of 243 men with clinical stage {<=}T2a, a prostate-specific antigen (PSA) concentration of <10 ng/ml, a biopsy-proven Gleason sum of {<=}6, and a maximum of 2 ipsilateral positive biopsy results out of 10 or more cores. All men underwent a radical prostatectomy, and pathology stage was used as the gold standard. Univariable and multivariable logistic regression models were tested for significant predictors of unilateral, organ-confined PCa. These predictors consisted of PSA, %fPSA (defined as the quotient of free [uncomplexed] PSA divided by the total PSA), clinical stage (T2a vs. T1c), gland volume, and number of positive biopsy cores (2 vs. 1). Results: Despite unilateral stage at biopsy, bilateral or even non-organ-confined PCa was reported in 64% of all patients. In multivariable analyses, no variable could clearly and independently predict the presence of unilateral PCa. This was reflected in an overall accuracy of 58% (95% confidence interval, 50.6-65.8%). Conclusions: Two-thirds of patients with unilateral low-risk PCa, confirmed by clinical stage and biopsy findings, have bilateral or non-organ-confined PCa at radical prostatectomy. This alarming finding questions the safety and validity of HAT.

  9. Bioinformatics Approaches for Predicting Disordered Protein Motifs.

    PubMed

    Bhowmick, Pallab; Guharoy, Mainak; Tompa, Peter

    2015-01-01

    Short, linear motifs (SLiMs) in proteins are functional microdomains consisting of contiguous residue segments along the protein sequence, typically not more than 10 consecutive amino acids in length with less than 5 defined positions. Many positions are 'degenerate' thus offering flexibility in terms of the amino acid types allowed at those positions. Their short length and degenerate nature confers evolutionary plasticity meaning that SLiMs often evolve convergently. Further, SLiMs have a propensity to occur within intrinsically unstructured protein segments and this confers versatile functionality to unstructured regions of the proteome. SLiMs mediate multiple types of protein interactions based on domain-peptide recognition and guide functions including posttranslational modifications, subcellular localization of proteins, and ligand binding. SLiMs thus behave as modular interaction units that confer versatility to protein function and SLiM-mediated interactions are increasingly being recognized as therapeutic targets. In this chapter we start with a brief description about the properties of SLiMs and their interactions and then move on to discuss algorithms and tools including several web-based methods that enable the discovery of novel SLiMs (de novo motif discovery) as well as the prediction of novel occurrences of known SLiMs. Both individual amino acid sequences as well as sets of protein sequences can be scanned using these methods to obtain statistically overrepresented sequence patterns. Lists of putatively functional SLiMs are then assembled based on parameters such as evolutionary sequence conservation, disorder scores, structural data, gene ontology terms and other contextual information that helps to assess the functional credibility or significance of these motifs. These bioinformatics methods should certainly guide experiments aimed at motif discovery. PMID:26387106

  10. Predicting Ca(2+)-binding sites in proteins.

    PubMed Central

    Nayal, M; Di Cera, E

    1994-01-01

    The coordination shell of Ca2+ ions in proteins contains almost exclusively oxygen atoms supported by an outer shell of carbon atoms. The bond-strength contribution of each ligating oxygen in the inner shell can be evaluated by using an empirical expression successfully applied in the analysis of crystals of metal oxides. The sum of such contributions closely approximates the valence of the bound cation. When a protein is embedded in a very fine grid of points and an algorithm is used to calculate the valence of each point representing a potential Ca(2+)-binding site, a typical distribution of valence values peaked around 0.4 is obtained. In 32 documented Ca(2+)-binding proteins, containing a total of 62 Ca(2+)-binding sites, a very small fraction of points in the distribution has a valence close to that of Ca2+. Only 0.06% of the points have a valence > or = 1.4. These points share the remarkable tendency to cluster around documented Ca2+ ions. A high enough value of the valence is both necessary (58 out of 62 Ca(2+)-binding sites have a valence > or = 1.4) and sufficient (87% of the grid points with a valence > or = 1.4 are within 1.0 A from a documented Ca2+ ion) to predict the location of bound Ca2+ ions. The algorithm can also be used for the analysis of other cations and predicts the location of Mg(2+)- and Na(+)-binding sites in a number of proteins. The valence is, therefore, a tool of pinpoint accuracy for locating cation-binding sites, which can also be exploited in engineering high-affinity binding sites and characterizing the linkage between structural components and functional energetics for molecular recognition of metal ions by proteins. Images Fig. 4 PMID:8290605

  11. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.

    PubMed

    Zhang, Lichao; Kong, Liang; Han, Xiaodong; Lv, Jinfeng

    2016-07-01

    Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research. PMID:27084358

  12. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra

    PubMed Central

    Xiao, Kaijie; Yu, Fan; Fang, Houqin; Xue, Bingbing; Liu, Yan; Tian, Zhixin

    2015-01-01

    It has long been an analytical challenge to accurately and efficiently resolve extremely dense overlapping isotopic envelopes (OIEs) in protein tandem mass spectra to confidently identify proteins. Here, we report a computationally efficient method, called OIE_CARE, to resolve OIEs by calculating the relative deviation between the ideal and observed experimental abundance. In the OIE_CARE method, the ideal experimental abundance of a particular overlapping isotopic peak (OIP) is first calculated for all the OIEs sharing this OIP. The relative deviation (RD) of the overall observed experimental abundance of this OIP relative to the summed ideal value is then calculated. The final individual abundance of the OIP for each OIE is the individual ideal experimental abundance multiplied by 1 + RD. Initial studies were performed using higher-energy collisional dissociation tandem mass spectra on myoglobin (with direct infusion) and the intact E. coli proteome (with liquid chromatographic separation). Comprehensive data at the protein and proteome levels, high confidence and good reproducibility were achieved. The resolving method reported here can, in principle, be extended to resolve any envelope-type overlapping data for which the corresponding theoretical reference values are available. PMID:26439836

  13. Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties

    PubMed Central

    2014-01-01

    Background Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. Results The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. Conclusion Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins. PMID:25521329

  14. SHIFTX2: significantly improved protein chemical shift prediction.

    PubMed

    Han, Beomsoo; Liu, Yifeng; Ginzinger, Simon W; Wishart, David S

    2011-05-01

    A new computer program, called SHIFTX2, is described which is capable of rapidly and accurately calculating diamagnetic (1)H, (13)C and (15)N chemical shifts from protein coordinate data. Compared to its predecessor (SHIFTX) and to other existing protein chemical shift prediction programs, SHIFTX2 is substantially more accurate (up to 26% better by correlation coefficient with an RMS error that is up to 3.3× smaller) than the next best performing program. It also provides significantly more coverage (up to 10% more), is significantly faster (up to 8.5×) and capable of calculating a wider variety of backbone and side chain chemical shifts (up to 6×) than many other shift predictors. In particular, SHIFTX2 is able to attain correlation coefficients between experimentally observed and predicted backbone chemical shifts of 0.9800 ((15)N), 0.9959 ((13)Cα), 0.9992 ((13)Cβ), 0.9676 ((13)C'), 0.9714 ((1)HN), 0.9744 ((1)Hα) and RMS errors of 1.1169, 0.4412, 0.5163, 0.5330, 0.1711, and 0.1231 ppm, respectively. The correlation between SHIFTX2's predicted and observed side chain chemical shifts is 0.9787 ((13)C) and 0.9482 ((1)H) with RMS errors of 0.9754 and 0.1723 ppm, respectively. SHIFTX2 is able to achieve such a high level of accuracy by using a large, high quality database of training proteins (>190), by utilizing advanced machine learning techniques, by incorporating many more features (χ(2) and χ(3) angles, solvent accessibility, H-bond geometry, pH, temperature), and by combining sequence-based with structure-based chemical shift prediction techniques. With this substantial improvement in accuracy we believe that SHIFTX2 will open the door to many long-anticipated applications of chemical shift prediction to protein structure determination, refinement and validation. SHIFTX2 is available both as a standalone program and as a web server ( http://www.shiftx2.ca ). PMID:21448735

  15. Neural Wiskott-Aldrich Syndrome Protein Is Required for Accurate Chromosome Congression and Segregation

    PubMed Central

    Park, Sun Joo; Takenawa, Tadaomi

    2011-01-01

    The accurate distribution and segregation of replicated chromosomes through mitosis is crucial for cellular viability and development of organisms. Kinetochores are responsible for the proper congression and segregation of chromosomes. Here, we show that neural Wiskott-Aldrich syndrome protein (N-WASP) localizes to and forms a complex with kinetochores in mitotic cells. Depletion of NWASP by RNA interference causes chromosome misalignment, prolonged mitosis, and abnormal chromosomal segregation, which is associated with decreased proliferation of N-WASP-deficient cells. N-WASP-deficient cells display defects in the kinetochores recruitment of inner and outer kinetochore components, CENP-A, CENP-E, and Mad2. Live-cell imaging analysis of GFP-α-tubulin revealed that depletion of N-WASP impairs microtubule attachment to chromosomes in mitotic cells. All these results indicate that N-WASP plays a role in efficient assembly of kinetochores and attachment of microtubules to chromosomes, which is essential for accurate chromosome congression and segregation. PMID:21533546

  16. Automated Selected Reaction Monitoring Software for Accurate Label-Free Protein Quantification

    PubMed Central

    2012-01-01

    Selected reaction monitoring (SRM) is a mass spectrometry method with documented ability to quantify proteins accurately and reproducibly using labeled reference peptides. However, the use of labeled reference peptides becomes impractical if large numbers of peptides are targeted and when high flexibility is desired when selecting peptides. We have developed a label-free quantitative SRM workflow that relies on a new automated algorithm, Anubis, for accurate peak detection. Anubis efficiently removes interfering signals from contaminating peptides to estimate the true signal of the targeted peptides. We evaluated the algorithm on a published multisite data set and achieved results in line with manual data analysis. In complex peptide mixtures from whole proteome digests of Streptococcus pyogenes we achieved a technical variability across the entire proteome abundance range of 6.5–19.2%, which was considerably below the total variation across biological samples. Our results show that the label-free SRM workflow with automated data analysis is feasible for large-scale biological studies, opening up new possibilities for quantitative proteomics and systems biology. PMID:22658081

  17. Electrostatics of proteins in dielectric solvent continua. I. An accurate and efficient reaction field description

    SciTech Connect

    Bauer, Sebastian; Mathias, Gerald; Tavan, Paul

    2014-03-14

    We present a reaction field (RF) method which accurately solves the Poisson equation for proteins embedded in dielectric solvent continua at a computational effort comparable to that of an electrostatics calculation with polarizable molecular mechanics (MM) force fields. The method combines an approach originally suggested by Egwolf and Tavan [J. Chem. Phys. 118, 2039 (2003)] with concepts generalizing the Born solution [Z. Phys. 1, 45 (1920)] for a solvated ion. First, we derive an exact representation according to which the sources of the RF potential and energy are inducible atomic anti-polarization densities and atomic shielding charge distributions. Modeling these atomic densities by Gaussians leads to an approximate representation. Here, the strengths of the Gaussian shielding charge distributions are directly given in terms of the static partial charges as defined, e.g., by standard MM force fields for the various atom types, whereas the strengths of the Gaussian anti-polarization densities are calculated by a self-consistency iteration. The atomic volumes are also described by Gaussians. To account for covalently overlapping atoms, their effective volumes are calculated by another self-consistency procedure, which guarantees that the dielectric function ε(r) is close to one everywhere inside the protein. The Gaussian widths σ{sub i} of the atoms i are parameters of the RF approximation. The remarkable accuracy of the method is demonstrated by comparison with Kirkwood's analytical solution for a spherical protein [J. Chem. Phys. 2, 351 (1934)] and with computationally expensive grid-based numerical solutions for simple model systems in dielectric continua including a di-peptide (Ac-Ala-NHMe) as modeled by a standard MM force field. The latter example shows how weakly the RF conformational free energy landscape depends on the parameters σ{sub i}. A summarizing discussion highlights the achievements of the new theory and of its approximate solution

  18. Electrostatics of proteins in dielectric solvent continua. I. An accurate and efficient reaction field description

    NASA Astrophysics Data System (ADS)

    Bauer, Sebastian; Mathias, Gerald; Tavan, Paul

    2014-03-01

    We present a reaction field (RF) method which accurately solves the Poisson equation for proteins embedded in dielectric solvent continua at a computational effort comparable to that of an electrostatics calculation with polarizable molecular mechanics (MM) force fields. The method combines an approach originally suggested by Egwolf and Tavan [J. Chem. Phys. 118, 2039 (2003)] with concepts generalizing the Born solution [Z. Phys. 1, 45 (1920)] for a solvated ion. First, we derive an exact representation according to which the sources of the RF potential and energy are inducible atomic anti-polarization densities and atomic shielding charge distributions. Modeling these atomic densities by Gaussians leads to an approximate representation. Here, the strengths of the Gaussian shielding charge distributions are directly given in terms of the static partial charges as defined, e.g., by standard MM force fields for the various atom types, whereas the strengths of the Gaussian anti-polarization densities are calculated by a self-consistency iteration. The atomic volumes are also described by Gaussians. To account for covalently overlapping atoms, their effective volumes are calculated by another self-consistency procedure, which guarantees that the dielectric function ɛ(r) is close to one everywhere inside the protein. The Gaussian widths σi of the atoms i are parameters of the RF approximation. The remarkable accuracy of the method is demonstrated by comparison with Kirkwood's analytical solution for a spherical protein [J. Chem. Phys. 2, 351 (1934)] and with computationally expensive grid-based numerical solutions for simple model systems in dielectric continua including a di-peptide (Ac-Ala-NHMe) as modeled by a standard MM force field. The latter example shows how weakly the RF conformational free energy landscape depends on the parameters σi. A summarizing discussion highlights the achievements of the new theory and of its approximate solution particularly by

  19. Electrostatics of proteins in dielectric solvent continua. I. An accurate and efficient reaction field description.

    PubMed

    Bauer, Sebastian; Mathias, Gerald; Tavan, Paul

    2014-03-14

    We present a reaction field (RF) method which accurately solves the Poisson equation for proteins embedded in dielectric solvent continua at a computational effort comparable to that of an electrostatics calculation with polarizable molecular mechanics (MM) force fields. The method combines an approach originally suggested by Egwolf and Tavan [J. Chem. Phys. 118, 2039 (2003)] with concepts generalizing the Born solution [Z. Phys. 1, 45 (1920)] for a solvated ion. First, we derive an exact representation according to which the sources of the RF potential and energy are inducible atomic anti-polarization densities and atomic shielding charge distributions. Modeling these atomic densities by Gaussians leads to an approximate representation. Here, the strengths of the Gaussian shielding charge distributions are directly given in terms of the static partial charges as defined, e.g., by standard MM force fields for the various atom types, whereas the strengths of the Gaussian anti-polarization densities are calculated by a self-consistency iteration. The atomic volumes are also described by Gaussians. To account for covalently overlapping atoms, their effective volumes are calculated by another self-consistency procedure, which guarantees that the dielectric function ε(r) is close to one everywhere inside the protein. The Gaussian widths σ(i) of the atoms i are parameters of the RF approximation. The remarkable accuracy of the method is demonstrated by comparison with Kirkwood's analytical solution for a spherical protein [J. Chem. Phys. 2, 351 (1934)] and with computationally expensive grid-based numerical solutions for simple model systems in dielectric continua including a di-peptide (Ac-Ala-NHMe) as modeled by a standard MM force field. The latter example shows how weakly the RF conformational free energy landscape depends on the parameters σ(i). A summarizing discussion highlights the achievements of the new theory and of its approximate solution particularly by

  20. Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique.

    PubMed

    Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ma, Zhiqiang

    2015-06-01

    As a widespread type of protein post-translational modifications (PTMs), succinylation plays an important role in regulating protein conformation, function and physicochemical properties. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of succinylation sites are much desirable due to their convenient and fast speed. Currently, numerous computational models have been developed to identify PTMs sites through various types of two-class machine learning algorithms. These methods require both positive and negative samples for training. However, designation of the negative samples of PTMs was difficult and if it is not properly done can affect the performance of computational models dramatically. So that in this work, we implemented the first application of positive samples only learning (PSoL) algorithm to succinylation sites prediction problem, which was a special class of semi-supervised machine learning that used positive samples and unlabeled samples to train the model. Meanwhile, we proposed a novel succinylation sites computational predictor called SucPred (succinylation site predictor) by using multiple feature encoding schemes. Promising results were obtained by the SucPred predictor with an accuracy of 88.65% using 5-fold cross validation on the training dataset and an accuracy of 84.40% on the independent testing dataset, which demonstrated that the positive samples only learning algorithm presented here was particularly useful for identification of protein succinylation sites. Besides, the positive samples only learning algorithm can be applied to build predictors for other types of PTMs sites with ease. A web server for predicting succinylation sites was developed and was freely accessible at http://59.73.198.144:8088/SucPred/. PMID:25843215

  1. Prediction of Peptide and Protein Propensity for Amyloid Formation

    PubMed Central

    Família, Carlos; Dennison, Sarah R.; Quintas, Alexandre; Phoenix, David A.

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation. PMID:26241652

  2. Accurate prediction model of bead geometry in crimping butt of the laser brazing using generalized regression neural network

    NASA Astrophysics Data System (ADS)

    Rong, Y. M.; Chang, Y.; Huang, Y.; Zhang, G. J.; Shao, X. Y.

    2015-12-01

    There are few researches that concentrate on the prediction of the bead geometry for laser brazing with crimping butt. This paper addressed the accurate prediction of the bead profile by developing a generalized regression neural network (GRNN) algorithm. Firstly GRNN model was developed and trained to decrease the prediction error that may be influenced by the sample size. Then the prediction accuracy was demonstrated by comparing with other articles and back propagation artificial neural network (BPNN) algorithm. Eventually the reliability and stability of GRNN model were discussed from the points of average relative error (ARE), mean square error (MSE) and root mean square error (RMSE), while the maximum ARE and MSE were 6.94% and 0.0303 that were clearly less than those (14.28% and 0.0832) predicted by BPNN. Obviously, it was proved that the prediction accuracy was improved at least 2 times, and the stability was also increased much more.

  3. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  4. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

    PubMed Central

    2014-01-01

    Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894

  5. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. PMID:26718864

  6. Modeling and fitting protein-protein complexes to predict change of binding energy

    PubMed Central

    Dourado, Daniel F.A.R.; Flores, Samuel Coulbourn

    2016-01-01

    It is possible to accurately and economically predict change in protein-protein interaction energy upon mutation (ΔΔG), when a high-resolution structure of the complex is available. This is of growing usefulness for design of high-affinity or otherwise modified binding proteins for therapeutic, diagnostic, industrial, and basic science applications. Recently the field has begun to pursue ΔΔG prediction for homology modeled complexes, but so far this has worked mostly for cases of high sequence identity. If the interacting proteins have been crystallized in free (uncomplexed) form, in a majority of cases it is possible to find a structurally similar complex which can be used as the basis for template-based modeling. We describe how to use MMB to create such models, and then use them to predict ΔΔG, using a dataset consisting of free target structures, co-crystallized template complexes with sequence identify with respect to the targets as low as 44%, and experimental ΔΔG measurements. We obtain similar results by fitting to a low-resolution Cryo-EM density map. Results suggest that other structural constraints may lead to a similar outcome, making the method even more broadly applicable. PMID:27173910

  7. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation.

    PubMed

    Huang, Qiaoying; You, Zhuhong; Zhang, Xiaofeng; Zhou, Yong

    2015-01-01

    With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein-protein interactions (PPIs) research is becoming more and more important. Life activities and the protein-protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction. PMID:25984606

  8. A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

    PubMed

    Wei, Zhi-Sen; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun

    2015-10-01

    Protein-protein interactions exist ubiquitously and play important roles in the life cycles of living cells. The interaction sites (residues) are essential to understanding the underlying mechanisms of protein-protein interactions. Previous research has demonstrated that the accurate identification of protein-protein interaction sites (PPIs) is helpful for developing new therapeutic drugs because many drugs will interact directly with those residues. Because of its significant potential in biological research and drug development, the prediction of PPIs has become an important topic in computational biology. However, a severe data imbalance exists in the PPIs prediction problem, where the number of the majority class samples (non-interacting residues) is far larger than that of the minority class samples (interacting residues). Thus, we developed a novel cascade random forests algorithm (CRF) to address the serious data imbalance that exists in the PPIs prediction problem. The proposed CRF resolves the negative effect of data imbalance by connecting multiple random forests in a cascade-like manner, each of which is trained with a balanced training subset that includes all minority samples and a subset of majority samples using an effective ensemble protocol. Based on the proposed CRF, we implemented a new sequence-based PPIs predictor, called CRF-PPI, which takes the combined features of position-specific scoring matrices, averaged cumulative hydropathy, and predicted relative solvent accessibility as model inputs. Benchmark experiments on both the cross validation and independent validation datasets demonstrated that the proposed CRF-PPI outperformed the state-of-the-art sequence-based PPIs predictors. The source code for CRF-PPI and the benchmark datasets are available online at http://csbio.njust.edu.cn/bioinf/CRF-PPI for free academic use. PMID:26441427

  9. Accurate quantification of diffusion and binding kinetics of non-integral membrane proteins by FRAP.

    PubMed

    Berkovich, Ronen; Wolfenson, Haguy; Eisenberg, Sharon; Ehrlich, Marcelo; Weiss, Matthias; Klafter, Joseph; Henis, Yoav I; Urbakh, Michael

    2011-11-01

    Non-integral membrane proteins frequently act as transduction hubs in vital signaling pathways initiated at the plasma membrane (PM). Their biological activity depends on dynamic interactions with the PM, which are governed by their lateral and cytoplasmic diffusion and membrane binding/unbinding kinetics. Accurate quantification of the multiple kinetic parameters characterizing their membrane interaction dynamics has been challenging. Despite a fair number of approximate fitting functions for analyzing fluorescence recovery after photobleaching (FRAP) data, no approach was able to cope with the full diffusion-exchange problem. Here, we present an exact solution and matlab fitting programs for FRAP with a stationary Gaussian laser beam, allowing simultaneous determination of the membrane (un)binding rates and the diffusion coefficients. To reduce the number of fitting parameters, the cytoplasmic diffusion coefficient is determined separately. Notably, our equations include the dependence of the exchange kinetics on the distribution of the measured protein between the PM and the cytoplasm, enabling the derivation of both k(on) and k(off) without prior assumptions. After validating the fitting function by computer simulations, we confirm the applicability of our approach to live-cell data by monitoring the dynamics of GFP-N-Ras mutants under conditions with different contributions of lateral diffusion and exchange to the FRAP kinetics. PMID:21810156

  10. Towards more accurate wind and solar power prediction by improving NWP model physics

    NASA Astrophysics Data System (ADS)

    Steiner, Andrea; Köhler, Carmen; von Schumann, Jonas; Ritter, Bodo

    2014-05-01

    The growing importance and successive expansion of renewable energies raise new challenges for decision makers, economists, transmission system operators, scientists and many more. In this interdisciplinary field, the role of Numerical Weather Prediction (NWP) is to reduce the errors and provide an a priori estimate of remaining uncertainties associated with the large share of weather-dependent power sources. For this purpose it is essential to optimize NWP model forecasts with respect to those prognostic variables which are relevant for wind and solar power plants. An improved weather forecast serves as the basis for a sophisticated power forecasts. Consequently, a well-timed energy trading on the stock market, and electrical grid stability can be maintained. The German Weather Service (DWD) currently is involved with two projects concerning research in the field of renewable energy, namely ORKA*) and EWeLiNE**). Whereas the latter is in collaboration with the Fraunhofer Institute (IWES), the project ORKA is led by energy & meteo systems (emsys). Both cooperate with German transmission system operators. The goal of the projects is to improve wind and photovoltaic (PV) power forecasts by combining optimized NWP and enhanced power forecast models. In this context, the German Weather Service aims to improve its model system, including the ensemble forecasting system, by working on data assimilation, model physics and statistical post processing. This presentation is focused on the identification of critical weather situations and the associated errors in the German regional NWP model COSMO-DE. First steps leading to improved physical parameterization schemes within the NWP-model are presented. Wind mast measurements reaching up to 200 m height above ground are used for the estimation of the (NWP) wind forecast error at heights relevant for wind energy plants. One particular problem is the daily cycle in wind speed. The transition from stable stratification during

  11. TOPPER: Topology Prediction of Transmembrane Protein Based on Evidential Reasoning

    PubMed Central

    Deng, Xinyang; Liu, Qi; Hu, Yong; Deng, Yong

    2013-01-01

    The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster's rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method. PMID:23401665

  12. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    SciTech Connect

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  13. A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

    NASA Astrophysics Data System (ADS)

    Carlson, Joel N. K.; Park, Jong Min; Park, So-Yeon; In Park, Jong; Choi, Yunseok; Ye, Sung-Joon

    2016-03-01

    Discrepancies between planned and delivered movements of multi-leaf collimators (MLCs) are an important source of errors in dose distributions during radiotherapy. In this work we used machine learning techniques to train models to predict these discrepancies, assessed the accuracy of the model predictions, and examined the impact these errors have on quality assurance (QA) procedures and dosimetry. Predictive leaf motion parameters for the models were calculated from the plan files, such as leaf position and velocity, whether the leaf was moving towards or away from the isocenter of the MLC, and many others. Differences in positions between synchronized DICOM-RT planning files and DynaLog files reported during QA delivery were used as a target response for training of the models. The final model is capable of predicting MLC positions during delivery to a high degree of accuracy. For moving MLC leaves, predicted positions were shown to be significantly closer to delivered positions than were planned positions. By incorporating predicted positions into dose calculations in the TPS, increases were shown in gamma passing rates against measured dose distributions recorded during QA delivery. For instance, head and neck plans with 1%/2 mm gamma criteria had an average increase in passing rate of 4.17% (SD  =  1.54%). This indicates that the inclusion of predictions during dose calculation leads to a more realistic representation of plan delivery. To assess impact on the patient, dose volumetric histograms (DVH) using delivered positions were calculated for comparison with planned and predicted DVHs. In all cases, predicted dose volumetric parameters were in closer agreement to the delivered parameters than were the planned parameters, particularly for organs at risk on the periphery of the treatment area. By incorporating the predicted positions into the TPS, the treatment planner is given a more realistic view of the dose distribution as it will truly be

  14. A machine learning approach to the accurate prediction of multi-leaf collimator positional errors.

    PubMed

    Carlson, Joel N K; Park, Jong Min; Park, So-Yeon; Park, Jong In; Choi, Yunseok; Ye, Sung-Joon

    2016-03-21

    Discrepancies between planned and delivered movements of multi-leaf collimators (MLCs) are an important source of errors in dose distributions during radiotherapy. In this work we used machine learning techniques to train models to predict these discrepancies, assessed the accuracy of the model predictions, and examined the impact these errors have on quality assurance (QA) procedures and dosimetry. Predictive leaf motion parameters for the models were calculated from the plan files, such as leaf position and velocity, whether the leaf was moving towards or away from the isocenter of the MLC, and many others. Differences in positions between synchronized DICOM-RT planning files and DynaLog files reported during QA delivery were used as a target response for training of the models. The final model is capable of predicting MLC positions during delivery to a high degree of accuracy. For moving MLC leaves, predicted positions were shown to be significantly closer to delivered positions than were planned positions. By incorporating predicted positions into dose calculations in the TPS, increases were shown in gamma passing rates against measured dose distributions recorded during QA delivery. For instance, head and neck plans with 1%/2 mm gamma criteria had an average increase in passing rate of 4.17% (SD  =  1.54%). This indicates that the inclusion of predictions during dose calculation leads to a more realistic representation of plan delivery. To assess impact on the patient, dose volumetric histograms (DVH) using delivered positions were calculated for comparison with planned and predicted DVHs. In all cases, predicted dose volumetric parameters were in closer agreement to the delivered parameters than were the planned parameters, particularly for organs at risk on the periphery of the treatment area. By incorporating the predicted positions into the TPS, the treatment planner is given a more realistic view of the dose distribution as it will truly be

  15. Structure prediction of magnetosome-associated proteins.

    PubMed

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region - the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models' overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  16. Structure prediction of magnetosome-associated proteins

    PubMed Central

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region – the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure–function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models’ overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  17. Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.

    PubMed

    Maheshwari, Surabhi; Brylinski, Michal

    2015-01-01

    The identification of protein-protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system-level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein-protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSite(PPI) , an algorithm that uses the three-dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSite(PPI) in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40-0.43 (0.91-0.92) demonstrating that eFindSite(PPI) performs well not only using experimental data but also tolerates structural imperfections in computer-generated structures. In addition, eFindSite(PPI) detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSite(PPI) outperforms other methods for protein-binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large-scale applications using raw genomic data. eFindSite(PPI) is

  18. On lattice protein structure prediction revisited.

    PubMed

    Dotu, Ivan; Cebrián, Manuel; Van Hentenryck, Pascal; Clote, Peter

    2011-01-01

    Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. In recent years, many approaches have been developed, moving to increasingly complex lattice models and off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face-Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. The flexible framework of this hybrid algorithm allows an adaptation to the Miyazawa-Jernigan contact potential, in place of the HP model, thus suggesting its potential for tertiary structure prediction. Benchmarking statistics are given for our method against the hydrophobic core threading program HPstruct, an exact method which can be viewed as complementary to our method. PMID:21358007

  19. Sensor Data Fusion for Accurate Cloud Presence Prediction Using Dempster-Shafer Evidence Theory

    PubMed Central

    Li, Jiaming; Luo, Suhuai; Jin, Jesse S.

    2010-01-01

    Sensor data fusion technology can be used to best extract useful information from multiple sensor observations. It has been widely applied in various applications such as target tracking, surveillance, robot navigation, signal and image processing. This paper introduces a novel data fusion approach in a multiple radiation sensor environment using Dempster-Shafer evidence theory. The methodology is used to predict cloud presence based on the inputs of radiation sensors. Different radiation data have been used for the cloud prediction. The potential application areas of the algorithm include renewable power for virtual power station where the prediction of cloud presence is the most challenging issue for its photovoltaic output. The algorithm is validated by comparing the predicted cloud presence with the corresponding sunshine occurrence data that were recorded as the benchmark. Our experiments have indicated that comparing to the approaches using individual sensors, the proposed data fusion approach can increase correct rate of cloud prediction by ten percent, and decrease unknown rate of cloud prediction by twenty three percent. PMID:22163414

  20. Structure-Based Prediction of Unstable Regions in Proteins: Applications to Protein Misfolding Diseases

    NASA Astrophysics Data System (ADS)

    Guest, Will; Cashman, Neil; Plotkin, Steven

    2009-03-01

    Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and solution of the Poisson-Boltzmann equation. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.

  1. Building a Better Fragment Library for De Novo Protein Structure Prediction

    PubMed Central

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  2. Using complete genome comparisons to identify sequences whose presence accurately predicts clinically important phenotypes.

    PubMed

    Hall, Barry G; Cardenas, Heliodoro; Barlow, Miriam

    2013-01-01

    In clinical settings it is often important to know not just the identity of a microorganism, but also the danger posed by that particular strain. For instance, Escherichia coli can range from being a harmless commensal to being a very dangerous enterohemorrhagic (EHEC) strain. Determining pathogenic phenotypes can be both time consuming and expensive. Here we propose a simple, rapid, and inexpensive method of predicting pathogenic phenotypes on the basis of the presence or absence of short homologous DNA segments in an isolate. Our method compares completely sequenced genomes without the necessity of genome alignments in order to identify the presence or absence of the segments to produce an automatic alignment of the binary string that describes each genome. Analysis of the segment alignment allows identification of those segments whose presence strongly predicts a phenotype. Clinical application of the method requires nothing more that PCR amplification of each of the set of predictive segments. Here we apply the method to identifying EHEC strains of E. coli and to distinguishing E. coli from Shigella. We show in silico that with as few as 8 predictive sequences, if even three of those predictive sequences are amplified the probability of being EHEC or Shigella is >0.99. The method is thus very robust to the occasional amplification failure for spurious reasons. Experimentally, we apply the method to screening a set of 98 isolates to distinguishing E. coli from Shigella, and EHEC from non-EHEC E. coli strains and show that all isolates are correctly identified. PMID:23935901

  3. Using search engine technology for protein function prediction.

    PubMed

    Chen, Ziyang; Cai, Zhao; Li, Min; Liu, Binbin

    2011-01-01

    Prediction of protein function is one of the most challenging problems in the post-genomic era. In this paper, we propose a novel algorithm Improved ProteinRank (IPR) for protein function prediction, which is based on the search engine technology and the preferential attachment criteria. In addition, an improved algorithm IPRW is developed from IPR to be used in the weighted protein?protein interaction (PPI) network. The proposed algorithms IPR and IPRW are applied to the PPI network of S.cerevisiae. The experimental results show that both IPR and IPRW outweigh the previous methods for the prediction of protein functions. PMID:21441099

  4. Assessment of Protein Side-Chain Conformation Prediction Methods in Different Residue Environments

    PubMed Central

    Peterson, Lenna X.; Kang, Xuejiao; Kihara, Daisuke

    2016-01-01

    Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. PMID:24619909

  5. Collective prediction of protein functions from protein-protein interaction networks

    PubMed Central

    2014-01-01

    Background Automated assignment of functions to unknown proteins is one of the most important task in computational biology. The development of experimental methods for genome scale analysis of molecular interaction networks offers new ways to infer protein function from protein-protein interaction (PPI) network data. Existing techniques for collective classification (CC) usually increase accuracy for network data, wherein instances are interlinked with each other, using a large amount of labeled data for training. However, the labeled data are time-consuming and expensive to obtain. On the other hand, one can easily obtain large amount of unlabeled data. Thus, more sophisticated methods are needed to exploit the unlabeled data to increase prediction accuracy for protein function prediction. Results In this paper, we propose an effective Markov chain based CC algorithm (ICAM) to tackle the label deficiency problem in CC for interrelated proteins from PPI networks. Our idea is to model the problem using two distinct Markov chain classifiers to make separate predictions with regard to attribute features from protein data and relational features from relational information. The ICAM learning algorithm combines the results of the two classifiers to compute the ranks of labels to indicate the importance of a set of labels to an instance, and uses an ICA framework to iteratively refine the learning models for improving performance of protein function prediction from PPI networks in the paucity of labeled data. Conclusion Experimental results on the real-world Yeast protein-protein interaction datasets show that our proposed ICAM method is better than the other ICA-type methods given limited labeled training data. This approach can serve as a valuable tool for the study of protein function prediction from PPI networks. PMID:24564855

  6. Empirical approaches to more accurately predict benthic-pelagic coupling in biogeochemical ocean models

    NASA Astrophysics Data System (ADS)

    Dale, Andy; Stolpovsky, Konstantin; Wallmann, Klaus

    2016-04-01

    The recycling and burial of biogenic material in the sea floor plays a key role in the regulation of ocean chemistry. Proper consideration of these processes in ocean biogeochemical models is becoming increasingly recognized as an important step in model validation and prediction. However, the rate of organic matter remineralization in sediments and the benthic flux of redox-sensitive elements are difficult to predict a priori. In this communication, examples of empirical benthic flux models that can be coupled to earth system models to predict sediment-water exchange in the open ocean are presented. Large uncertainties hindering further progress in this field include knowledge of the reactivity of organic carbon reaching the sediment, the importance of episodic variability in bottom water chemistry and particle rain rates (for both the deep-sea and margins) and the role of benthic fauna. How do we meet the challenge?

  7. An endometrial gene expression signature accurately predicts recurrent implantation failure after IVF

    PubMed Central

    Koot, Yvonne E. M.; van Hooff, Sander R.; Boomsma, Carolien M.; van Leenen, Dik; Groot Koerkamp, Marian J. A.; Goddijn, Mariëtte; Eijkemans, Marinus J. C.; Fauser, Bart C. J. M.; Holstege, Frank C. P.; Macklon, Nick S.

    2016-01-01

    The primary limiting factor for effective IVF treatment is successful embryo implantation. Recurrent implantation failure (RIF) is a condition whereby couples fail to achieve pregnancy despite consecutive embryo transfers. Here we describe the collection of gene expression profiles from mid-luteal phase endometrial biopsies (n = 115) from women experiencing RIF and healthy controls. Using a signature discovery set (n = 81) we identify a signature containing 303 genes predictive of RIF. Independent validation in 34 samples shows that the gene signature predicts RIF with 100% positive predictive value (PPV). The strength of the RIF associated expression signature also stratifies RIF patients into distinct groups with different subsequent implantation success rates. Exploration of the expression changes suggests that RIF is primarily associated with reduced cellular proliferation. The gene signature will be of value in counselling and guiding further treatment of women who fail to conceive upon IVF and suggests new avenues for developing intervention. PMID:26797113

  8. Accurate and inexpensive prediction of the color optical properties of anthocyanins in solution.

    PubMed

    Ge, Xiaochuan; Timrov, Iurii; Binnie, Simon; Biancardi, Alessandro; Calzolari, Arrigo; Baroni, Stefano

    2015-04-23

    The simulation of the color optical properties of molecular dyes in liquid solution requires the calculation of time evolution of the solute absorption spectra fluctuating in the solvent at finite temperature. Time-averaged spectra can be directly evaluated by combining ab initio Car-Parrinello molecular dynamics and time-dependent density functional theory calculations. The inclusion of hybrid exchange-correlation functionals, necessary for the prediction of the correct transition frequencies, prevents one from using these techniques for the simulation of the optical properties of large realistic systems. Here we present an alternative approach for the prediction of the color of natural dyes in solution with a low computational cost. We applied this approach to representative anthocyanin dyes: the excellent agreement between the simulated and the experimental colors makes this method a straightforward and inexpensive tool for the high-throughput prediction of colors of molecules in liquid solvents. PMID:25830823

  9. RepurposeVS: A Drug Repurposing-Focused Computational Method for Accurate Drug-Target Signature Predictions.

    PubMed

    Issa, Naiem T; Peters, Oakland J; Byers, Stephen W; Dakshanamurthy, Sivanesan

    2015-01-01

    We describe here RepurposeVS for the reliable prediction of drug-target signatures using X-ray protein crystal structures. RepurposeVS is a virtual screening method that incorporates docking, drug-centric and protein-centric 2D/3D fingerprints with a rigorous mathematical normalization procedure to account for the variability in units and provide high-resolution contextual information for drug-target binding. Validity was confirmed by the following: (1) providing the greatest enrichment of known drug binders for multiple protein targets in virtual screening experiments, (2) determining that similarly shaped protein target pockets are predicted to bind drugs of similar 3D shapes when RepurposeVS is applied to 2,335 human protein targets, and (3) determining true biological associations in vitro for mebendazole (MBZ) across many predicted kinase targets for potential cancer repurposing. Since RepurposeVS is a drug repurposing-focused method, benchmarking was conducted on a set of 3,671 FDA approved and experimental drugs rather than the Database of Useful Decoys (DUDE) so as to streamline downstream repurposing experiments. We further apply RepurposeVS to explore the overall potential drug repurposing space for currently approved drugs. RepurposeVS is not computationally intensive and increases performance accuracy, thus serving as an efficient and powerful in silico tool to predict drug-target associations in drug repurposing. PMID:26234515

  10. An Accurate, Clinically Feasible Multi-Gene Expression Assay for Predicting Metastasis in Uveal Melanoma

    PubMed Central

    Onken, Michael D.; Worley, Lori A.; Tuscan, Meghan D.; Harbour, J. William

    2010-01-01

    Uveal (ocular) melanoma is an aggressive cancer that often forms undetectable micrometastases before diagnosis of the primary tumor. These micrometastases later multiply to generate metastatic tumors that are resistant to therapy and are uniformly fatal. We have previously identified a gene expression profile derived from the primary tumor that is extremely accurate for identifying patients at high risk of metastatic disease. Development of a practical clinically feasible platform for analyzing this expression profile would benefit high-risk patients through intensified metastatic surveillance, earlier intervention for metastasis, and stratification for entry into clinical trials of adjuvant therapy. Here, we migrate the expression profile from a hybridization-based microarray platform to a robust, clinically practical, PCR-based 15-gene assay comprising 12 discriminating genes and three endogenous control genes. We analyze the technical performance of the assay in a prospective study of 609 tumor samples, including 421 samples sent from distant locations. We show that the assay can be performed accurately on fine needle aspirate biopsy samples, even when the quantity of RNA is below detectable limits. Preliminary outcome data from the prospective study affirm the prognostic accuracy of the assay. This prognostic assay provides an important addition to the armamentarium for managing patients with uveal melanoma, and it provides a proof of principle for the development of similar assays for other cancers. PMID:20413675

  11. Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

    PubMed Central

    Luo, Longqiang; Li, Dingfang; Zhang, Wen; Tu, Shikui; Zhu, Xiaopeng; Tian, Gang

    2016-01-01

    Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. PMID:27074043

  12. Viewing men's faces does not lead to accurate predictions of trustworthiness

    PubMed Central

    Efferson, Charles; Vogt, Sonja

    2013-01-01

    The evolution of cooperation requires some mechanism that reduces the risk of exploitation for cooperative individuals. Recent studies have shown that men with wide faces are anti-social, and they are perceived that way by others. This suggests that people could use facial width to identify anti-social men and thus limit the risk of exploitation. To see if people can make accurate inferences like this, we conducted a two-part experiment. First, males played a sequential social dilemma, and we took photographs of their faces. Second, raters then viewed these photographs and guessed how second movers behaved. Raters achieved significant accuracy by guessing that second movers exhibited reciprocal behaviour. Raters were not able to use the photographs to further improve accuracy. Indeed, some raters used the photographs to their detriment; they could have potentially achieved greater accuracy and earned more money by ignoring the photographs and assuming all second movers reciprocate. PMID:23308340

  13. Accurate prediction of the ammonia probes of a variable proton-to-electron mass ratio

    NASA Astrophysics Data System (ADS)

    Owens, A.; Yurchenko, S. N.; Thiel, W.; Špirko, V.

    2015-07-01

    A comprehensive study of the mass sensitivity of the vibration-rotation-inversion transitions of 14NH3, 15NH3, 14ND3 and 15ND3 is carried out variationally using the TROVE approach. Variational calculations are robust and accurate, offering a new way to compute sensitivity coefficients. Particular attention is paid to the Δk = ±3 transitions between the accidentally coinciding rotation-inversion energy levels of the ν2 = 0+, 0-, 1+ and 1- states, and the inversion transitions in the ν4 = 1 state affected by the `giant' l-type doubling effect. These transitions exhibit highly anomalous sensitivities, thus appearing as promising probes of a possible cosmological variation of the proton-to-electron mass ratio μ. Moreover, a simultaneous comparison of the calculated sensitivities reveals a sizeable isotopic dependence which could aid an exclusive ammonia detection.

  14. The Intrinsic Geometric Structure of Protein-Protein Interaction Networks for Protein Interaction Prediction.

    PubMed

    Fang, Yi; Sun, Mengtian; Dai, Guoxian; Ramain, Karthik

    2016-01-01

    Recent developments in high-throughput technologies for measuring protein-protein interaction (PPI) have profoundly advanced our ability to systematically infer protein function and regulation. However, inherently high false positive and false negative rates in measurement have posed great challenges in computational approaches for the prediction of PPI. A good PPI predictor should be 1) resistant to high rate of missing and spurious PPIs, and 2) robust against incompleteness of observed PPI networks. To predict PPI in a network, we developed an intrinsic geometry structure (IGS) for network, which exploits the intrinsic and hidden relationship among proteins in network through a heat diffusion process. In this process, all explicit PPIs participate simultaneously to glue local infinitesimal and noisy experimental interaction data to generate a global macroscopic descriptions about relationships among proteins. The revealed implicit relationship can be interpreted as the probability of two proteins interacting with each other. The revealed relationship is intrinsic and robust against individual, local and explicit protein interactions in the original network. We apply our approach to publicly available PPI network data for the evaluation of the performance of PPI prediction. Experimental results indicate that, under different levels of the missing and spurious PPIs, IGS is able to robustly exploit the intrinsic and hidden relationship for PPI prediction with a higher sensitivity and specificity compared to that of recently proposed methods. PMID:26886733

  15. Accurate Fault Prediction of BlueGene/P RAS Logs Via Geometric Reduction

    SciTech Connect

    Jones, Terry R; Kirby, Michael; Ladd, Joshua S; Dreisigmeyer, David; Thompson, Joshua

    2010-01-01

    The authors are building two algorithms for fault prediction using raw system-log data. This work is preliminary, and has only been applied to a limited dataset, however the results seem promising. The conclusions are that: (1) obtaining useful data from RAS-logs is challenging; (2) extracting concentrated information improves efficiency and accuracy; and (3) function evaluation algorithms are fast and lend well to scaling.

  16. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    MedlinePlus

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  17. Robust and Accurate Modeling Approaches for Migraine Per-Patient Prediction from Ambulatory Data.

    PubMed

    Pagán, Josué; De Orbe, M Irene; Gago, Ana; Sobrado, Mónica; Risco-Martín, José L; Mora, J Vivancos; Moya, José M; Ayala, José L

    2015-01-01

    Migraine is one of the most wide-spread neurological disorders, and its medical treatment represents a high percentage of the costs of health systems. In some patients, characteristic symptoms that precede the headache appear. However, they are nonspecific, and their prediction horizon is unknown and pretty variable; hence, these symptoms are almost useless for prediction, and they are not useful to advance the intake of drugs to be effective and neutralize the pain. To solve this problem, this paper sets up a realistic monitoring scenario where hemodynamic variables from real patients are monitored in ambulatory conditions with a wireless body sensor network (WBSN). The acquired data are used to evaluate the predictive capabilities and robustness against noise and failures in sensors of several modeling approaches. The obtained results encourage the development of per-patient models based on state-space models (N4SID) that are capable of providing average forecast windows of 47 min and a low rate of false positives. PMID:26134103

  18. Revisiting the blind tests in crystal structure prediction: accurate energy ranking of molecular crystals.

    PubMed

    Asmadi, Aldi; Neumann, Marcus A; Kendrick, John; Girard, Pascale; Perrin, Marc-Antoine; Leusen, Frank J J

    2009-12-24

    In the 2007 blind test of crystal structure prediction hosted by the Cambridge Crystallographic Data Centre (CCDC), a hybrid DFT/MM method correctly ranked each of the four experimental structures as having the lowest lattice energy of all the crystal structures predicted for each molecule. The work presented here further validates this hybrid method by optimizing the crystal structures (experimental and submitted) of the first three CCDC blind tests held in 1999, 2001, and 2004. Except for the crystal structures of compound IX, all structures were reminimized and ranked according to their lattice energies. The hybrid method computes the lattice energy of a crystal structure as the sum of the DFT total energy and a van der Waals (dispersion) energy correction. Considering all four blind tests, the crystal structure with the lowest lattice energy corresponds to the experimentally observed structure for 12 out of 14 molecules. Moreover, good geometrical agreement is observed between the structures determined by the hybrid method and those measured experimentally. In comparison with the correct submissions made by the blind test participants, all hybrid optimized crystal structures (apart from compound II) have the smallest calculated root mean squared deviations from the experimentally observed structures. It is predicted that a new polymorph of compound V exists under pressure. PMID:19950907

  19. Accurate Prediction of Drug-Induced Liver Injury Using Stem Cell-Derived Populations

    PubMed Central

    Szkolnicka, Dagmara; Farnworth, Sarah L.; Lucendo-Villarin, Baltasar; Storck, Christopher; Zhou, Wenli; Iredale, John P.; Flint, Oliver

    2014-01-01

    Despite major progress in the knowledge and management of human liver injury, there are millions of people suffering from chronic liver disease. Currently, the only cure for end-stage liver disease is orthotopic liver transplantation; however, this approach is severely limited by organ donation. Alternative approaches to restoring liver function have therefore been pursued, including the use of somatic and stem cell populations. Although such approaches are essential in developing scalable treatments, there is also an imperative to develop predictive human systems that more effectively study and/or prevent the onset of liver disease and decompensated organ function. We used a renewable human stem cell resource, from defined genetic backgrounds, and drove them through developmental intermediates to yield highly active, drug-inducible, and predictive human hepatocyte populations. Most importantly, stem cell-derived hepatocytes displayed equivalence to primary adult hepatocytes, following incubation with known hepatotoxins. In summary, we have developed a serum-free, scalable, and shippable cell-based model that faithfully predicts the potential for human liver injury. Such a resource has direct application in human modeling and, in the future, could play an important role in developing renewable cell-based therapies. PMID:24375539

  20. Robust and Accurate Modeling Approaches for Migraine Per-Patient Prediction from Ambulatory Data

    PubMed Central

    Pagán, Josué; Irene De Orbe, M.; Gago, Ana; Sobrado, Mónica; Risco-Martín, José L.; Vivancos Mora, J.; Moya, José M.; Ayala, José L.

    2015-01-01

    Migraine is one of the most wide-spread neurological disorders, and its medical treatment represents a high percentage of the costs of health systems. In some patients, characteristic symptoms that precede the headache appear. However, they are nonspecific, and their prediction horizon is unknown and pretty variable; hence, these symptoms are almost useless for prediction, and they are not useful to advance the intake of drugs to be effective and neutralize the pain. To solve this problem, this paper sets up a realistic monitoring scenario where hemodynamic variables from real patients are monitored in ambulatory conditions with a wireless body sensor network (WBSN). The acquired data are used to evaluate the predictive capabilities and robustness against noise and failures in sensors of several modeling approaches. The obtained results encourage the development of per-patient models based on state-space models (N4SID) that are capable of providing average forecast windows of 47 min and a low rate of false positives. PMID:26134103

  1. Accurate structure prediction of peptide–MHC complexes for identifying highly immunogenic antigens

    SciTech Connect

    Park, Min-Sun; Park, Sung Yong; Miller, Keith R.; Collins, Edward J.; Lee, Ha Youn

    2013-11-01

    Designing an optimal HIV-1 vaccine faces the challenge of identifying antigens that induce a broad immune capacity. One factor to control the breadth of T cell responses is the surface morphology of a peptide–MHC complex. Here, we present an in silico protocol for predicting peptide–MHC structure. A robust signature of a conformational transition was identified during all-atom molecular dynamics, which results in a model with high accuracy. A large test set was used in constructing our protocol and we went another step further using a blind test with a wild-type peptide and two highly immunogenic mutants, which predicted substantial conformational changes in both mutants. The center residues at position five of the analogs were configured to be accessible to solvent, forming a prominent surface, while the residue of the wild-type peptide was to point laterally toward the side of the binding cleft. We then experimentally determined the structures of the blind test set, using high resolution of X-ray crystallography, which verified predicted conformational changes. Our observation strongly supports a positive association of the surface morphology of a peptide–MHC complex to its immunogenicity. Our study offers the prospect of enhancing immunogenicity of vaccines by identifying MHC binding immunogens.

  2. HKC: an algorithm to predict protein complexes in protein-protein interaction networks.

    PubMed

    Wang, Xiaomin; Wang, Zhengzhi; Ye, Jun

    2011-01-01

    With the availability of more and more genome-scale protein-protein interaction (PPI) networks, research interests gradually shift to Systematic Analysis on these large data sets. A key topic is to predict protein complexes in PPI networks by identifying clusters that are densely connected within themselves but sparsely connected with the rest of the network. In this paper, we present a new topology-based algorithm, HKC, to detect protein complexes in genome-scale PPI networks. HKC mainly uses the concepts of highest k-core and cohesion to predict protein complexes by identifying overlapping clusters. The experiments on two data sets and two benchmarks show that our algorithm has relatively high F-measure and exhibits better performance compared with some other methods. PMID:22174556

  3. Essential protein identification based on essential protein-protein interaction prediction by Integrated Edge Weights.

    PubMed

    Jiang, Yuexu; Wang, Yan; Pang, Wei; Chen, Liang; Sun, Huiyan; Liang, Yanchun; Blanzieri, Enrico

    2015-07-15

    Essential proteins play a crucial role in cellular survival and development process. Experimentally, essential proteins are identified by gene knockouts or RNA interference, which are expensive and often fatal to the target organisms. Regarding this, an alternative yet important approach to essential protein identification is through computational prediction. Existing computational methods predict essential proteins based on their relative densities in a protein-protein interaction (PPI) network. Degree, betweenness, and other appropriate criteria are often used to measure the relative density. However, no matter what criterion is used, a protein is actually ordered by the attributes of this protein per se. In this research, we presented a novel computational method, Integrated Edge Weights (IEW), to first rank protein-protein interactions by integrating their edge weights, and then identified sub PPI networks consisting of those highly-ranked edges, and finally regarded the nodes in these sub networks as essential proteins. We evaluated IEW on three model organisms: Saccharomyces cerevisiae (S. cerevisiae), Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegans). The experimental results showed that IEW achieved better performance than the state-of-the-art methods in terms of precision-recall and Jackknife measures. We had also demonstrated that IEW is a robust and effective method, which can retrieve biologically significant modules by its highly-ranked protein-protein interactions for S. cerevisiae, E. coli, and C. elegans. We believe that, with sufficient data provided, IEW can be used to any other organisms' essential protein identification. A website about IEW can be accessed from http://digbio.missouri.edu/IEW/index.html. PMID:25892709

  4. More accurate predictions with transonic Navier-Stokes methods through improved turbulence modeling

    NASA Technical Reports Server (NTRS)

    Johnson, Dennis A.

    1989-01-01

    Significant improvements in predictive accuracies for off-design conditions are achievable through better turbulence modeling; and, without necessarily adding any significant complication to the numerics. One well established fact about turbulence is it is slow to respond to changes in the mean strain field. With the 'equilibrium' algebraic turbulence models no attempt is made to model this characteristic and as a consequence these turbulence models exaggerate the turbulent boundary layer's ability to produce turbulent Reynolds shear stresses in regions of adverse pressure gradient. As a consequence, too little momentum loss within the boundary layer is predicted in the region of the shock wave and along the aft part of the airfoil where the surface pressure undergoes further increases. Recently, a 'nonequilibrium' algebraic turbulence model was formulated which attempts to capture this important characteristic of turbulence. This 'nonequilibrium' algebraic model employs an ordinary differential equation to model the slow response of the turbulence to changes in local flow conditions. In its original form, there was some question as to whether this 'nonequilibrium' model performed as well as the 'equilibrium' models for weak interaction cases. However, this turbulence model has since been further improved wherein it now appears that this turbulence model performs at least as well as the 'equilibrium' models for weak interaction cases and for strong interaction cases represents a very significant improvement. The performance of this turbulence model relative to popular 'equilibrium' models is illustrated for three airfoil test cases of the 1987 AIAA Viscous Transonic Airfoil Workshop, Reno, Nevada. A form of this 'nonequilibrium' turbulence model is currently being applied to wing flows for which similar improvements in predictive accuracy are being realized.

  5. Network pattern of residue packing in helical membrane proteins and its application in membrane protein structure prediction.

    PubMed

    Pabuwal, Vagmita; Li, Zhijun

    2008-01-01

    De novo protein structure prediction plays an important role in studies of helical membrane proteins as well as structure-based drug design efforts. Developing an accurate scoring function for protein structure discrimination and validation remains a current challenge. Network approaches based on overall network patterns of residue packing have proven useful in soluble protein structure discrimination. It is thus of interest to apply similar approaches to the studies of residue packing in membrane proteins. In this work, we first carried out such analysis on a set of diverse, non-redundant and high-resolution membrane protein structures. Next, we applied the same approach to three test sets. The first set includes nine structures of membrane proteins with the resolution worse than 2.5 A; the other two sets include a total of 101 G-protein coupled receptor models, constructed using either de novo or homology modeling techniques. Results of analyses indicate the two criteria derived from studying high-resolution membrane protein structures are good indicators of a high-quality native fold and the approach is very effective for discriminating native membrane protein folds from less-native ones. These findings should be of help for the investigation of the fundamental problem of membrane protein structure prediction. PMID:18178566

  6. Accurate prediction of lattice energies and structures of molecular crystals with molecular quantum chemistry methods.

    PubMed

    Fang, Tao; Li, Wei; Gu, Fangwei; Li, Shuhua

    2015-01-13

    We extend the generalized energy-based fragmentation (GEBF) approach to molecular crystals under periodic boundary conditions (PBC), and we demonstrate the performance of the method for a variety of molecular crystals. With this approach, the lattice energy of a molecular crystal can be obtained from the energies of a series of embedded subsystems, which can be computed with existing advanced molecular quantum chemistry methods. The use of the field compensation method allows the method to take long-range electrostatic interaction of the infinite crystal environment into account and make the method almost translationally invariant. The computational cost of the present method scales linearly with the number of molecules in the unit cell. Illustrative applications demonstrate that the PBC-GEBF method with explicitly correlated quantum chemistry methods is capable of providing accurate descriptions on the lattice energies and structures for various types of molecular crystals. In addition, this approach can be employed to quantify the contributions of various intermolecular interactions to the theoretical lattice energy. Such qualitative understanding is very useful for rational design of molecular crystals. PMID:26574207

  7. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    SciTech Connect

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-28

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  8. Accurate predictions of dielectrophoretic force and torque on particles with strong mutual field, particle, and wall interactions

    NASA Astrophysics Data System (ADS)

    Liu, Qianlong; Reifsnider, Kenneth

    2012-11-01

    The basis of dielectrophoresis (DEP) is the prediction of the force and torque on particles. The classical approach to the prediction is based on the effective moment method, which, however, is an approximate approach, assumes infinitesimal particles. Therefore, it is well-known that for finite-sized particles, the DEP approximation is inaccurate as the mutual field, particle, wall interactions become strong, a situation presently attracting extensive research for practical significant applications. In the present talk, we provide accurate calculations of the force and torque on the particles from first principles, by directly resolving the local geometry and properties and accurately accounting for the mutual interactions for finite-sized particles with both dielectric polarization and conduction in a sinusoidally steady-state electric field. Since the approach has a significant advantage, compared to other numerical methods, to efficiently simulate many closely packed particles, it provides an important, unique, and accurate technique to investigate complex DEP phenomena, for example heterogeneous mixtures containing particle chains, nanoparticle assembly, biological cells, non-spherical effects, etc. This study was supported by the Department of Energy under funding for an EFRC (the HeteroFoaM Center), grant no. DE-SC0001061.

  9. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons.

    PubMed

    Oyeyemi, Victor B; Krisiloff, David B; Keith, John A; Libisch, Florian; Pavone, Michele; Carter, Emily A

    2014-01-28

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs. PMID:25669533

  10. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    NASA Astrophysics Data System (ADS)

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-01

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  11. Predicting host tropism of influenza A virus proteins using random forest

    PubMed Central

    2014-01-01

    Background Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans. Results In this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability. Conclusions From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation

  12. The Compensatory Reserve For Early and Accurate Prediction Of Hemodynamic Compromise: A Review of the Underlying Physiology.

    PubMed

    Convertino, Victor A; Wirt, Michael D; Glenn, John F; Lein, Brian C

    2016-06-01

    Shock is deadly and unpredictable if it is not recognized and treated in early stages of hemorrhage. Unfortunately, measurements of standard vital signs that are displayed on current medical monitors fail to provide accurate or early indicators of shock because of physiological mechanisms that effectively compensate for blood loss. As a result of new insights provided by the latest research on the physiology of shock using human experimental models of controlled hemorrhage, it is now recognized that measurement of the body's reserve to compensate for reduced circulating blood volume is the single most important indicator for early and accurate assessment of shock. We have called this function the "compensatory reserve," which can be accurately assessed by real-time measurements of changes in the features of the arterial waveform. In this paper, the physiology underlying the development and evaluation of a new noninvasive technology that allows for real-time measurement of the compensatory reserve will be reviewed, with its clinical implications for earlier and more accurate prediction of shock. PMID:26950588

  13. A novel method to predict visual field progression more accurately, using intraocular pressure measurements in glaucoma patients.

    PubMed

    2016-01-01

    Visual field (VF) data were retrospectively obtained from 491 eyes in 317 patients with open angle glaucoma who had undergone ten VF tests (Humphrey Field Analyzer, 24-2, SITA standard). First, mean of total deviation values (mTD) in the tenth VF was predicted using standard linear regression of the first five VFs (VF1-5) through to using all nine preceding VFs (VF1-9). Then an 'intraocular pressure (IOP)-integrated VF trend analysis' was carried out by simply using time multiplied by IOP as the independent term in the linear regression model. Prediction errors (absolute prediction error or root mean squared error: RMSE) for predicting mTD and also point wise TD values of the tenth VF were obtained from both approaches. The mTD absolute prediction errors associated with the IOP-integrated VF trend analysis were significantly smaller than those from the standard trend analysis when VF1-6 through to VF1-8 were used (p < 0.05). The point wise RMSEs from the IOP-integrated trend analysis were significantly smaller than those from the standard trend analysis when VF1-5 through to VF1-9 were used (p < 0.05). This was especially the case when IOP was measured more frequently. Thus a significantly more accurate prediction of VF progression is possible using a simple trend analysis that incorporates IOP measurements. PMID:27562553

  14. A novel method to predict visual field progression more accurately, using intraocular pressure measurements in glaucoma patients

    PubMed Central

    Asaoka, Ryo; Fujino, Yuri; Murata, Hiroshi; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Shoji, Nobuyuki

    2016-01-01

    Visual field (VF) data were retrospectively obtained from 491 eyes in 317 patients with open angle glaucoma who had undergone ten VF tests (Humphrey Field Analyzer, 24-2, SITA standard). First, mean of total deviation values (mTD) in the tenth VF was predicted using standard linear regression of the first five VFs (VF1-5) through to using all nine preceding VFs (VF1-9). Then an ‘intraocular pressure (IOP)-integrated VF trend analysis’ was carried out by simply using time multiplied by IOP as the independent term in the linear regression model. Prediction errors (absolute prediction error or root mean squared error: RMSE) for predicting mTD and also point wise TD values of the tenth VF were obtained from both approaches. The mTD absolute prediction errors associated with the IOP-integrated VF trend analysis were significantly smaller than those from the standard trend analysis when VF1-6 through to VF1-8 were used (p < 0.05). The point wise RMSEs from the IOP-integrated trend analysis were significantly smaller than those from the standard trend analysis when VF1-5 through to VF1-9 were used (p < 0.05). This was especially the case when IOP was measured more frequently. Thus a significantly more accurate prediction of VF progression is possible using a simple trend analysis that incorporates IOP measurements. PMID:27562553

  15. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  16. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  17. A Foundation for the Accurate Prediction of the Soft Error Vulnerability of Scientific Applications

    SciTech Connect

    Bronevetsky, G; de Supinski, B; Schulz, M

    2009-02-13

    Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientific applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. This paper analyzes the vulnerability of these routines to soft errors by characterizing how their outputs are affected by injected errors and by evaluating several techniques for predicting how errors propagate from the input to the output of each routine. The resulting error profiles can be used to understand the fault vulnerability of full applications that use these routines.

  18. nuMap: a web platform for accurate prediction of nucleosome positioning.

    PubMed

    Alharbi, Bader A; Alshammari, Thamir H; Felton, Nathan L; Zhurkin, Victor B; Cui, Feng

    2014-10-01

    Nucleosome positioning is critical for gene expression and of major biological interest. The high cost of experimentally mapping nucleosomal arrangement signifies the need for computational approaches to predict nucleosome positions at high resolution. Here, we present a web-based application to fulfill this need by implementing two models, YR and W/S schemes, for the translational and rotational positioning of nucleosomes, respectively. Our methods are based on sequence-dependent anisotropic bending that dictates how DNA is wrapped around a histone octamer. This application allows users to specify a number of options such as schemes and parameters for threading calculation and provides multiple layout formats. The nuMap is implemented in Java/Perl/MySQL and is freely available for public use at http://numap.rit.edu. The user manual, implementation notes, description of the methodology and examples are available at the site. PMID:25220945

  19. Simplified versus geometrically accurate models of forefoot anatomy to predict plantar pressures: A finite element study.

    PubMed

    Telfer, Scott; Erdemir, Ahmet; Woodburn, James; Cavanagh, Peter R

    2016-01-25

    Integration of patient-specific biomechanical measurements into the design of therapeutic footwear has been shown to improve clinical outcomes in patients with diabetic foot disease. The addition of numerical simulations intended to optimise intervention design may help to build on these advances, however at present the time and labour required to generate and run personalised models of foot anatomy restrict their routine clinical utility. In this study we developed second-generation personalised simple finite element (FE) models of the forefoot with varying geometric fidelities. Plantar pressure predictions from barefoot, shod, and shod with insole simulations using simplified models were compared to those obtained from CT-based FE models incorporating more detailed representations of bone and tissue geometry. A simplified model including representations of metatarsals based on simple geometric shapes, embedded within a contoured soft tissue block with outer geometry acquired from a 3D surface scan was found to provide pressure predictions closest to the more complex model, with mean differences of 13.3kPa (SD 13.4), 12.52kPa (SD 11.9) and 9.6kPa (SD 9.3) for barefoot, shod, and insole conditions respectively. The simplified model design could be produced in <1h compared to >3h in the case of the more detailed model, and solved on average 24% faster. FE models of the forefoot based on simplified geometric representations of the metatarsal bones and soft tissue surface geometry from 3D surface scans may potentially provide a simulation approach with improved clinical utility, however further validity testing around a range of therapeutic footwear types is required. PMID:26708965

  20. An accurate and efficient method for prediction of the long-term evolution of space debris in the geosynchronous region

    NASA Astrophysics Data System (ADS)

    McNamara, Roger P.; Eagle, C. D.

    1992-08-01

    Planetary Observer High Accuracy Orbit Prediction Program (POHOP), an existing numerical integrator, was modified with the solar and lunar formulae developed by T.C. Van Flandern and K.F. Pulkkinen to provide the accuracy required to evaluate long-term orbit characteristics of objects on the geosynchronous region. The orbit of a 1000 kg class spacecraft is numerically integrated over 50 years using both the original and the more accurate solar and lunar ephemerides methods. Results of this study demonstrate that, over the long term, for an object located in the geosynchronous region, the more accurate solar and lunar ephemerides effects on the objects's position are significantly different than using the current POHOP ephemeris.

  1. RBO Aleph: leveraging novel information sources for protein structure prediction

    PubMed Central

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-01-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue–residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  2. RBO Aleph: leveraging novel information sources for protein structure prediction.

    PubMed

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-07-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue-residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  3. Accurate single-day titration of adenovirus vectors based on equivalence of protein VII nuclear dots and infectious particles

    PubMed Central

    Walkiewicz, Marcin P.; Morral, Nuria; Engel, Daniel A.

    2009-01-01

    Summary Protein VII is an abundant component of adenovirus particles and is tightly associated with the viral DNA. It enters the nucleus along with the infecting viral genome and remains bound throughout early phase. Protein VII can be visualized by immunofluorescent staining as discrete dots in the infected cell nucleus. Comparison between protein VII staining and expression of the 72 kDa DNA binding protein revealed a one-to-one correspondence between protein VII dots and infectious viral genomes. A similar relationship was observed for a helper-dependent adenovirus vector expressing green fluorescent protein. This relationship allowed accurate titration of adenovirus preparations, including wild-type and helper-dependent vectors, using a one-day immunofluorescence method. The method can be applied to any adenovirus vector and gives results equivalent to the standard plaque assay. PMID:19406166

  4. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  5. Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies

    NASA Astrophysics Data System (ADS)

    Balabin, Roman M.; Lomakina, Ekaterina I.

    2009-08-01

    Artificial neural network (ANN) approach has been applied to estimate the density functional theory (DFT) energy with large basis set using lower-level energy values and molecular descriptors. A total of 208 different molecules were used for the ANN training, cross validation, and testing by applying BLYP, B3LYP, and BMK density functionals. Hartree-Fock results were reported for comparison. Furthermore, constitutional molecular descriptor (CD) and quantum-chemical molecular descriptor (QD) were used for building the calibration model. The neural network structure optimization, leading to four to five hidden neurons, was also carried out. The usage of several low-level energy values was found to greatly reduce the prediction error. An expected error, mean absolute deviation, for ANN approximation to DFT energies was 0.6±0.2 kcal mol-1. In addition, the comparison of the different density functionals with the basis sets and the comparison of multiple linear regression results were also provided. The CDs were found to overcome limitation of the QD. Furthermore, the effective ANN model for DFT/6-311G(3df,3pd) and DFT/6-311G(2df,2pd) energy estimation was developed, and the benchmark results were provided.

  6. Towards Accurate Prediction of Turbulent, Three-Dimensional, Recirculating Flows with the NCC

    NASA Technical Reports Server (NTRS)

    Iannetti, A.; Tacina, R.; Jeng, S.-M.; Cai, J.

    2001-01-01

    The National Combustion Code (NCC) was used to calculate the steady state, nonreacting flow field of a prototype Lean Direct Injection (LDI) swirler. This configuration used nine groups of eight holes drilled at a thirty-five degree angle to induce swirl. These nine groups created swirl in the same direction, or a corotating pattern. The static pressure drop across the holes was fixed at approximately four percent. Computations were performed on one quarter of the geometry, because the geometry is considered rotationally periodic every ninety degrees. The final computational grid used was approximately 2.26 million tetrahedral cells, and a cubic nonlinear k - epsilon model was used to model turbulence. The NCC results were then compared to time averaged Laser Doppler Velocimetry (LDV) data. The LDV measurements were performed on the full geometry, but four ninths of the geometry was measured. One-, two-, and three-dimensional representations of both flow fields are presented. The NCC computations compare both qualitatively and quantitatively well to the LDV data, but differences exist downstream. The comparison is encouraging, and shows that NCC can be used for future injector design studies. To improve the flow prediction accuracy of turbulent, three-dimensional, recirculating flow fields with the NCC, recommendations are given.

  7. The human skin/chick chorioallantoic membrane model accurately predicts the potency of cosmetic allergens.

    PubMed

    Slodownik, Dan; Grinberg, Igor; Spira, Ram M; Skornik, Yehuda; Goldstein, Ronald S

    2009-04-01

    The current standard method for predicting contact allergenicity is the murine local lymph node assay (LLNA). Public objection to the use of animals in testing of cosmetics makes the development of a system that does not use sentient animals highly desirable. The chorioallantoic membrane (CAM) of the chick egg has been extensively used for the growth of normal and transformed mammalian tissues. The CAM is not innervated, and embryos are sacrificed before the development of pain perception. The aim of this study was to determine whether the sensitization phase of contact dermatitis to known cosmetic allergens can be quantified using CAM-engrafted human skin and how these results compare with published EC3 data obtained with the LLNA. We studied six common molecules used in allergen testing and quantified migration of epidermal Langerhans cells (LC) as a measure of their allergic potency. All agents with known allergic potential induced statistically significant migration of LC. The data obtained correlated well with published data for these allergens generated using the LLNA test. The human-skin CAM model therefore has great potential as an inexpensive, non-radioactive, in vivo alternative to the LLNA, which does not require the use of sentient animals. In addition, this system has the advantage of testing the allergic response of human, rather than animal skin. PMID:19054059

  8. Accurate prediction of the refractive index of polymers using first principles and data modeling

    NASA Astrophysics Data System (ADS)

    Afzal, Mohammad Atif Faiz; Cheng, Chong; Hachmann, Johannes

    Organic polymers with a high refractive index (RI) have recently attracted considerable interest due to their potential application in optical and optoelectronic devices. The ability to tailor the molecular structure of polymers is the key to increasing the accessible RI values. Our work concerns the creation of predictive in silico models for the optical properties of organic polymers, the screening of large-scale candidate libraries, and the mining of the resulting data to extract the underlying design principles that govern their performance. This work was set up to guide our experimentalist partners and allow them to target the most promising candidates. Our model is based on the Lorentz-Lorenz equation and thus includes the polarizability and number density values for each candidate. For the former, we performed a detailed benchmark study of different density functionals, basis sets, and the extrapolation scheme towards the polymer limit. For the number density we devised an exceedingly efficient machine learning approach to correlate the polymer structure and the packing fraction in the bulk material. We validated the proposed RI model against the experimentally known RI values of 112 polymers. We could show that the proposed combination of physical and data modeling is both successful and highly economical to characterize a wide range of organic polymers, which is a prerequisite for virtual high-throughput screening.

  9. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  10. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  11. Protein corona fingerprinting predicts the cellular interaction of gold and silver nanoparticles.

    PubMed

    Walkey, Carl D; Olsen, Jonathan B; Song, Fayi; Liu, Rong; Guo, Hongbo; Olsen, D Wesley H; Cohen, Yoram; Emili, Andrew; Chan, Warren C W

    2014-03-25

    Using quantitative models to predict the biological interactions of nanoparticles will accelerate the translation of nanotechnology. Here, we characterized the serum protein corona 'fingerprint' formed around a library of 105 surface-modified gold nanoparticles. Applying a bioinformatics-inspired approach, we developed a multivariate model that uses the protein corona fingerprint to predict cell association 50% more accurately than a model that uses parameters describing nanoparticle size, aggregation state, and surface charge. Our model implicates a set of hyaluronan-binding proteins as mediators of nanoparticle-cell interactions. This study establishes a framework for developing a comprehensive database of protein corona fingerprints and biological responses for multiple nanoparticle types. Such a database can be used to develop quantitative relationships that predict the biological responses to nanoparticles and will aid in uncovering the fundamental mechanisms of nano-bio interactions. PMID:24517450

  12. Industrial Compositional Streamline Simulation for Efficient and Accurate Prediction of Gas Injection and WAG Processes

    SciTech Connect

    Margot Gerritsen

    2008-10-31

    Gas-injection processes are widely and increasingly used for enhanced oil recovery (EOR). In the United States, for example, EOR production by gas injection accounts for approximately 45% of total EOR production and has tripled since 1986. The understanding of the multiphase, multicomponent flow taking place in any displacement process is essential for successful design of gas-injection projects. Due to complex reservoir geometry, reservoir fluid properties and phase behavior, the design of accurate and efficient numerical simulations for the multiphase, multicomponent flow governing these processes is nontrivial. In this work, we developed, implemented and tested a streamline based solver for gas injection processes that is computationally very attractive: as compared to traditional Eulerian solvers in use by industry it computes solutions with a computational speed orders of magnitude higher and a comparable accuracy provided that cross-flow effects do not dominate. We contributed to the development of compositional streamline solvers in three significant ways: improvement of the overall framework allowing improved streamline coverage and partial streamline tracing, amongst others; parallelization of the streamline code, which significantly improves wall clock time; and development of new compositional solvers that can be implemented along streamlines as well as in existing Eulerian codes used by industry. We designed several novel ideas in the streamline framework. First, we developed an adaptive streamline coverage algorithm. Adding streamlines locally can reduce computational costs by concentrating computational efforts where needed, and reduce mapping errors. Adapting streamline coverage effectively controls mass balance errors that mostly result from the mapping from streamlines to pressure grid. We also introduced the concept of partial streamlines: streamlines that do not necessarily start and/or end at wells. This allows more efficient coverage and avoids

  13. The development and verification of a highly accurate collision prediction model for automated noncoplanar plan delivery

    SciTech Connect

    Yu, Victoria Y.; Tran, Angelia; Nguyen, Dan; Cao, Minsong; Ruan, Dan; Low, Daniel A.; Sheng, Ke

    2015-11-15

    attributed to phantom setup errors due to the slightly deformable and flexible phantom extremities. The estimated site-specific safety buffer distance with 0.001% probability of collision for (gantry-to-couch, gantry-to-phantom) was (1.23 cm, 3.35 cm), (1.01 cm, 3.99 cm), and (2.19 cm, 5.73 cm) for treatment to the head, lung, and prostate, respectively. Automated delivery to all three treatment sites was completed in 15 min and collision free using a digital Linac. Conclusions: An individualized collision prediction model for the purpose of noncoplanar beam delivery was developed and verified. With the model, the study has demonstrated the feasibility of predicting deliverable beams for an individual patient and then guiding fully automated noncoplanar treatment delivery. This work motivates development of clinical workflows and quality assurance procedures to allow more extensive use and automation of noncoplanar beam geometries.

  14. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  15. How Accurate Are the Anthropometry Equations in in Iranian Military Men in Predicting Body Composition?

    PubMed Central

    Shakibaee, Abolfazl; Faghihzadeh, Soghrat; Alishiri, Gholam Hossein; Ebrahimpour, Zeynab; Faradjzadeh, Shahram; Sobhani, Vahid; Asgari, Alireza

    2015-01-01

    Background: The body composition varies according to different life styles (i.e. intake calories and caloric expenditure). Therefore, it is wise to record military personnel’s body composition periodically and encourage those who abide to the regulations. Different methods have been introduced for body composition assessment: invasive and non-invasive. Amongst them, the Jackson and Pollock equation is most popular. Objectives: The recommended anthropometric prediction equations for assessing men’s body composition were compared with dual-energy X-ray absorptiometry (DEXA) gold standard to develop a modified equation to assess body composition and obesity quantitatively among Iranian military men. Patients and Methods: A total of 101 military men aged 23 - 52 years old with a mean age of 35.5 years were recruited and evaluated in the present study (average height, 173.9 cm and weight, 81.5 kg). The body-fat percentages of subjects were assessed both with anthropometric assessment and DEXA scan. The data obtained from these two methods were then compared using multiple regression analysis. Results: The mean and standard deviation of body fat percentage of the DEXA assessment was 21.2 ± 4.3 and body fat percentage obtained from three Jackson and Pollock 3-, 4- and 7-site equations were 21.1 ± 5.8, 22.2 ± 6.0 and 20.9 ± 5.7, respectively. There was a strong correlation between these three equations and DEXA (R² = 0.98). Conclusions: The mean percentage of body fat obtained from the three equations of Jackson and Pollock was very close to that of body fat obtained from DEXA; however, we suggest using a modified Jackson-Pollock 3-site equation for volunteer military men because the 3-site equation analysis method is simpler and faster than other methods. PMID:26715964

  16. Predictable tuning of protein expression in bacteria.

    PubMed

    Bonde, Mads T; Pedersen, Margit; Klausen, Michael S; Jensen, Sheila I; Wulff, Tune; Harrison, Scott; Nielsen, Alex T; Herrgård, Markus J; Sommer, Morten O A

    2016-03-01

    We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expression level of any Escherichia coli gene by changing only a few bases. Measured protein levels for 91% of our designed sequences were within twofold of the desired target level. PMID:26752768

  17. Ensemble learning prediction of protein-protein interactions using proteins functional annotations.

    PubMed

    Saha, Indrajit; Zubek, Julian; Klingström, Tomas; Forsberg, Simon; Wikander, Johan; Kierczak, Marcin; Maulik, Ujjwal; Plewczynski, Dariusz

    2014-04-01

    Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at . PMID:24469380

  18. Protein structure prediction and analysis using the Robetta server

    PubMed Central

    Kim, David E.; Chivian, Dylan; Baker, David

    2004-01-01

    The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein–protein interactions using computational interface alanine scanning. The Rosetta protein design and protein–protein docking methodologies will soon be available through the server as well. PMID:15215442

  19. Accurate prediction of interference minima in linear molecular harmonic spectra by a modified two-center model

    NASA Astrophysics Data System (ADS)

    Xin, Cui; Di-Yu, Zhang; Gao, Chen; Ji-Gen, Chen; Si-Liang, Zeng; Fu-Ming, Guo; Yu-Jun, Yang

    2016-03-01

    We demonstrate that the interference minima in the linear molecular harmonic spectra can be accurately predicted by a modified two-center model. Based on systematically investigating the interference minima in the linear molecular harmonic spectra by the strong-field approximation (SFA), it is found that the locations of the harmonic minima are related not only to the nuclear distance between the two main atoms contributing to the harmonic generation, but also to the symmetry of the molecular orbital. Therefore, we modify the initial phase difference between the double wave sources in the two-center model, and predict the harmonic minimum positions consistent with those simulated by SFA. Project supported by the National Basic Research Program of China (Grant No. 2013CB922200) and the National Natural Science Foundation of China (Grant Nos. 11274001, 11274141, 11304116, 11247024, and 11034003), and the Jilin Provincial Research Foundation for Basic Research, China (Grant Nos. 20130101012JC and 20140101168JC).

  20. Deformation, Failure, and Fatigue Life of SiC/Ti-15-3 Laminates Accurately Predicted by MAC/GMC

    NASA Technical Reports Server (NTRS)

    Bednarcyk, Brett A.; Arnold, Steven M.

    2002-01-01

    NASA Glenn Research Center's Micromechanics Analysis Code with Generalized Method of Cells (MAC/GMC) (ref.1) has been extended to enable fully coupled macro-micro deformation, failure, and fatigue life predictions for advanced metal matrix, ceramic matrix, and polymer matrix composites. Because of the multiaxial nature of the code's underlying micromechanics model, GMC--which allows the incorporation of complex local inelastic constitutive models--MAC/GMC finds its most important application in metal matrix composites, like the SiC/Ti-15-3 composite examined here. Furthermore, since GMC predicts the microscale fields within each constituent of the composite material, submodels for local effects such as fiber breakage, interfacial debonding, and matrix fatigue damage can and have been built into MAC/GMC. The present application of MAC/GMC highlights the combination of these features, which has enabled the accurate modeling of the deformation, failure, and life of titanium matrix composites.

  1. Evaluating Mesoscale Numerical Weather Predictions and Spatially Distributed Meteorologic Forcing Data for Developing Accurate SWE Forecasts over Large Mountain Basins

    NASA Astrophysics Data System (ADS)

    Hedrick, A. R.; Marks, D. G.; Winstral, A. H.; Marshall, H. P.

    2014-12-01

    The ability to forecast snow water equivalent, or SWE, in mountain catchments would benefit many different communities ranging from avalanche hazard mitigation to water resource management. Historical model runs of Isnobal, the physically based energy balance snow model, have been produced over the 2150 km2 Boise River Basin for water years 2012 - 2014 at 100-meter resolution. Spatially distributed forcing parameters such as precipitation, wind, and relative humidity are generated from automated weather stations located throughout the watershed, and are supplied to Isnobal at hourly timesteps. Similarly, the Weather Research & Forecasting (WRF) Model provides hourly predictions of the same forcing parameters from an atmospheric physics perspective. This work aims to quantitatively compare WRF model output to the spatial meteorologic fields developed to force Isnobal, with the hopes of eventually using WRF predictions to create accurate hourly forecasts of SWE over a large mountainous basin.

  2. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

    PubMed Central

    Hayat, Sikander; Sander, Chris; Marks, Debora S.

    2015-01-01

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases. PMID:25858953

  3. Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins

    PubMed Central

    Wei, Y.; Floudas, C. A.

    2011-01-01

    In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy. PMID:21892227

  4. Accurate First-Principles Spectra Predictions for Ethylene and its Isotopologues from Full 12D AB Initio Surfaces

    NASA Astrophysics Data System (ADS)

    Delahaye, Thibault; Rey, Michael; Tyuterev, Vladimir; Nikitin, Andrei V.; Szalay, Peter

    2015-06-01

    Hydrocarbons such as ethylene (C_2H_4) and methane (CH_4) are of considerable interest for the modeling of planetary atmospheres and other astrophysical applications. Knowledge of rovibrational transitions of hydrocarbons is of primary importance in many fields but remains a formidable challenge for the theory and spectral analysis. Essentially two theoretical approaches for the computation and prediction of spectra exist. The first one is based on empirically-fitted effective spectroscopic models. Several databases aim at collecting the corresponding data but the information about C_2H_4 spectrum present in these databases remains limited, only some spectral ranges around 1000, 3000 and 6000 cm-1 being available. Another way for computing energies, line positions and intensities is based on global variational calculations using ab initio surfaces. Although they do not yet reach the spectroscopic accuracy, they could provide reliable predictions which could be quantitatively accurate with respect to the precision of available observations and as complete as possible. All this thus requires extensive first-principles quantum mechanical calculations essentially based on two necessary ingredients: (i) accurate intramolecular potential energy surface and dipole moment surface components and (ii) efficient computational methods to achieve a good numerical convergence. We report predictions of vibrational and rovibrational energy levels of C_2H_4 using our new ground state potential energy surface obtained from extended ab initio calculations. Additionally we will introduce line positions and line intensities predictions based on a new dipole moment surface for ethylene. These results will be compared with previous works on ethylene and its isotopologues.

  5. Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?

    PubMed Central

    2014-01-01

    Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data. PMID:24528282

  6. Functional prediction of hypothetical proteins in human adenoviruses.

    PubMed

    Dorden, Shane; Mahadevan, Padmanabhan

    2015-01-01

    Assigning functional information to hypothetical proteins in virus genomes is crucial for gaining insight into their proteomes. Human adenoviruses are medium sized viruses that cause a range of diseases. Their genomes possess proteins with uncharacterized function known as hypothetical proteins. Using a wide range of protein function prediction servers, functional information was obtained about these hypothetical proteins. A comparison of functional information obtained from these servers revealed that some of them produced functional information, while others provided little functional information about these human adenovirus hypothetical proteins. The PFP, ESG, PSIPRED, 3d2GO, and ProtFun servers produced the most functional information regarding these hypothetical proteins. PMID:26664031

  7. The 82-plex plasma protein signature that predicts increasing inflammation

    PubMed Central

    Tepel, Martin; Beck, Hans C.; Tan, Qihua; Borst, Christoffer; Rasmussen, Lars M.

    2015-01-01

    The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney transplant recipients and quantified 359 plasma proteins simultaneously using nano-Liquid-Chromatography-Tandem Mass-Spectrometry in individual samples and plasma C-reactive protein on the index day and the next day. Next-day C-reactive protein increased in 59 patients whereas it decreased in 32 patients. The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P < 0.0001). Multivariable logistic regression showed that 82-plex protein signature (P < 0.001) was associated with observed increased next-day C-reactive protein. The 82-plex protein signature outperformed routine clinical procedures. The category-free net reclassification index improved with 82-plex plasma protein signature (total net reclassification index, 88.3%). Using the 82-plex plasma protein signature increased net reclassification index with a clinical meaningful 10% increase of risk mainly by the improvement of reclassification of subjects in the event group. An 82-plex plasma protein signature predicts an increase of the inflammatory marker C-reactive protein. PMID:26445912

  8. Binding affinity prediction for protein-ligand complexes based on β contacts and B factor.

    PubMed

    Liu, Qian; Kwoh, Chee Keong; Li, Jinyan

    2013-11-25

    Accurate determination of protein-ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein-ligand docking. A number of scoring functions have been proposed for the prediction of protein-ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation under the leave-one-cluster-out cross-validation (LCOCV). We introduce a new scoring function named B2BScore to improve the prediction performance. B2BScore integrates two physicochemical properties for protein-ligand binding affinity prediction. One is the property of β contacts. A β contact between two atoms requires no other atoms to interrupt the atomic contact and assumes that the two atoms should have enough direct contact area. The other is the property of B factor to capture the atomic mobility in the dynamic protein-ligand binding process. Tested on the PDBBind2009 data set, B2BScore shows superior prediction performance to existing methods on independent test data as well as under the LCOCV evaluation framework. In particular, B2BScore achieves a significant LCOCV improvement across 26 protein clusters-a big increase of the averaged Pearson's correlation coefficients from 0.418 to 0.518 and a significant decrease of standard deviation of the coefficients from 0.352 to 0.196. We also identified several important and intuitive contact descriptors of protein-ligand binding through the random forest learning in B2BScore. Some of these descriptors are closely related to contacts between carbon atoms without covalent-bond oxygen/nitrogen, preferred contacts of metal ions, interfacial backbone atoms from proteins, or π rings. Some others are negative descriptors relating to those contacts with nitrogen atoms without covalent-bond hydrogens or nonpreferred contacts of metal ions. These descriptors can be directly used to guide protein-ligand docking. PMID:24191692

  9. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  10. A hyperspectral imaging system for an accurate prediction of the above-ground biomass of individual rice plants.

    PubMed

    Feng, Hui; Jiang, Ni; Huang, Chenglong; Fang, Wei; Yang, Wanneng; Chen, Guoxing; Xiong, Lizhong; Liu, Qian

    2013-09-01

    Biomass is an important component of the plant phenomics, and the existing methods for biomass estimation for individual plants are either destructive or lack accuracy. In this study, a hyperspectral imaging system was developed for the accurate prediction of the above-ground biomass of individual rice plants in the visible and near-infrared spectral region. First, the structure of the system and the influence of various parameters on the camera acquisition speed were established. Then the system was used to image 152 rice plants, which selected from the rice mini-core collection, in two stages, the tillering to elongation (T-E) stage and the booting to heading (B-H) stage. Several variables were extracted from the images. Following, linear stepwise regression analysis and 5-fold cross-validation were used to select effective variables for model construction and test the stability of the model, respectively. For the T-E stage, the R(2) value was 0.940 for the fresh weight (FW) and 0.935 for the dry weight (DW). For the B-H stage, the R(2) value was 0.891 for the FW and 0.783 for the DW. Moreover, estimations of the biomass using visible light images were also calculated. These comparisons showed that hyperspectral imaging performed better than the visible light imaging. Therefore, this study provides not only a stable hyperspectral imaging platform but also an accurate and nondestructive method for the prediction of biomass for individual rice plants. PMID:24089866

  11. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. PMID:26121186

  12. Accurate design of co-assembling multi-component protein nanomaterials

    NASA Astrophysics Data System (ADS)

    King, Neil P.; Bale, Jacob B.; Sheffler, William; McNamara, Dan E.; Gonen, Shane; Gonen, Tamir; Yeates, Todd O.; Baker, David

    2014-06-01

    The self-assembly of proteins into highly ordered nanoscale architectures is a hallmark of biological systems. The sophisticated functions of these molecular machines have inspired the development of methods to engineer self-assembling protein nanostructures; however, the design of multi-component protein nanomaterials with high accuracy remains an outstanding challenge. Here we report a computational method for designing protein nanomaterials in which multiple copies of two distinct subunits co-assemble into a specific architecture. We use the method to design five 24-subunit cage-like protein nanomaterials in two distinct symmetric architectures and experimentally demonstrate that their structures are in close agreement with the computational design models. The accuracy of the method and the number and variety of two-component materials that it makes accessible suggest a route to the construction of functional protein nanomaterials tailored to specific applications.

  13. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  14. Blind predictions of protein interfaces by docking calculations in CAPRI.

    PubMed

    Lensink, Marc F; Wodak, Shoshana J

    2010-11-15

    Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent

  15. Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier

    PubMed Central

    Geng, Haijiang; Lu, Tao; Lin, Xiao; Liu, Yu; Yan, Fangrong

    2015-01-01

    Protein functions through interactions with other proteins and biomolecules and these interactions occur on the so-called interface residues of the protein sequences. Identifying interface residues makes us better understand the biological mechanism of protein interaction. Meanwhile, information about the interface residues contributes to the understanding of metabolic, signal transduction networks and indicates directions in drug designing. In recent years, researchers have focused on developing new computational methods for predicting protein interface residues. Here we creatively used a 181-dimension protein sequence feature vector as input to the Naive Bayes Classifier- (NBC-) based method to predict interaction sites in protein-protein complexes interaction. The prediction of interaction sites in protein interactions is regarded as an amino acid residue binary classification problem by applying NBC with protein sequence features. Independent test results suggested that Naive Bayes Classifier-based method with the protein sequence features as input vectors performed well. PMID:26697220

  16. Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

    PubMed Central

    Nariai, Naoki; Kolaczyk, Eric D.; Kasif, Simon

    2007-01-01

    Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. PMID:17396164

  17. A novel method for protein-protein interaction site prediction using phylogenetic substitution models

    PubMed Central

    La, David; Kihara, Daisuke

    2011-01-01

    Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. Based on this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and non-binding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins. PMID:21989996

  18. Accurate prediction of unsteady and time-averaged pressure loads using a hybrid Reynolds-Averaged/large-eddy simulation technique

    NASA Astrophysics Data System (ADS)

    Bozinoski, Radoslav

    Significant research has been performed over the last several years on understanding the unsteady aerodynamics of various fluid flows. Much of this work has focused on quantifying the unsteady, three-dimensional flow field effects which have proven vital to the accurate prediction of many fluid and aerodynamic problems. Up until recently, engineers have predominantly relied on steady-state simulations to analyze the inherently three-dimensional ow structures that are prevalent in many of today's "real-world" problems. Increases in computational capacity and the development of efficient numerical methods can change this and allow for the solution of the unsteady Reynolds-Averaged Navier-Stokes (RANS) equations for practical three-dimensional aerodynamic applications. An integral part of this capability has been the performance and accuracy of the turbulence models coupled with advanced parallel computing techniques. This report begins with a brief literature survey of the role fully three-dimensional, unsteady, Navier-Stokes solvers have on the current state of numerical analysis. Next, the process of creating a baseline three-dimensional Multi-Block FLOw procedure called MBFLO3 is presented. Solutions for an inviscid circular arc bump, laminar at plate, laminar cylinder, and turbulent at plate are then presented. Results show good agreement with available experimental, numerical, and theoretical data. Scalability data for the parallel version of MBFLO3 is presented and shows efficiencies of 90% and higher for processes of no less than 100,000 computational grid points. Next, the description and implementation techniques used for several turbulence models are presented. Following the successful implementation of the URANS and DES procedures, the validation data for separated, non-reattaching flows over a NACA 0012 airfoil, wall-mounted hump, and a wing-body junction geometry are presented. Results for the NACA 0012 showed significant improvement in flow predictions

  19. A physical approach to protein structure prediction: CASP4 results

    SciTech Connect

    Crivelli, Silvia; Eskow, Elizabeth; Bader, Brett; Lamberti, Vincent; Byrd, Richard; Schnabel, Robert; Head-Gordon, Teresa

    2001-02-27

    We describe our global optimization method called Stochastic Perturbation with Soft Constraints (SPSC), which uses information from known proteins to predict secondary structure, but not in the tertiary structure predictions or in generating the terms of the physics-based energy function. Our approach is also characterized by the use of an all atom energy function that includes a novel hydrophobic solvation function derived from experiments that shows promising ability for energy discrimination against misfolded structures. We present the results obtained using our SPSC method and energy function for blind prediction in the 4th Critical Assessment of Techniques for Protein Structure Prediction (CASP4) competition, and show that our approach is more effective on targets for which less information from known proteins is available. In fact our SPSC method produced the best prediction for one of the most difficult targets of the competition, a new fold protein of 240 amino acids.

  20. Prediction and redesign of protein-protein interactions.

    PubMed

    Lua, Rhonald C; Marciano, David C; Katsonis, Panagiotis; Adikesavan, Anbu K; Wilkins, Angela D; Lichtarge, Olivier

    2014-01-01

    Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI. PMID:24878423

  1. DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool

    PubMed Central

    Motion, Graham B.; Howden, Andrew J. M.; Huitema, Edgar; Jones, Susan

    2015-01-01

    There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes. PMID:26304539

  2. A large-scale evaluation of computational protein function prediction

    PubMed Central

    Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650

  3. A large-scale evaluation of computational protein function prediction.

    PubMed

    Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-03-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools. PMID:23353650

  4. 4D prediction of protein (1)H chemical shifts.

    PubMed

    Lehtivarjo, Juuso; Hassinen, Tommi; Korhonen, Samuli-Petrus; Peräkylä, Mikael; Laatikainen, Reino

    2009-12-01

    A 4D approach for protein (1)H chemical shift prediction was explored. The 4th dimension is the molecular flexibility, mapped using molecular dynamics simulations. The chemical shifts were predicted with a principal component model based on atom coordinates from a database of 40 protein structures. When compared to the corresponding non-dynamic (3D) model, the 4th dimension improved prediction by 6-7%. The prediction method achieved RMS errors of 0.29 and 0.50 ppm for Halpha and HN shifts, respectively. However, for individual proteins the RMS errors were 0.17-0.34 and 0.34-0.65 ppm for the Halpha and HN shifts, respectively. X-ray structures gave better predictions than the corresponding NMR structures, indicating that chemical shifts contain invaluable information about local structures. The (1)H chemical shift prediction tool 4DSPOT is available from http://www.uku.fi/kemia/4dspot . PMID:19876601

  5. Are current atomistic force fields accurate enough to study proteins in crowded environments?

    PubMed

    Petrov, Drazen; Zagrovic, Bojan

    2014-05-01

    The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the

  6. Are Current Atomistic Force Fields Accurate Enough to Study Proteins in Crowded Environments?

    PubMed Central

    Petrov, Drazen; Zagrovic, Bojan

    2014-01-01

    The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the

  7. Fast and accurate resonance assignment of small-to-large proteins by combining automated and manual approaches.

    PubMed

    Niklasson, Markus; Ahlner, Alexandra; Andresen, Cecilia; Marsh, Joseph A; Lundström, Patrik

    2015-01-01

    The process of resonance assignment is fundamental to most NMR studies of protein structure and dynamics. Unfortunately, the manual assignment of residues is tedious and time-consuming, and can represent a significant bottleneck for further characterization. Furthermore, while automated approaches have been developed, they are often limited in their accuracy, particularly for larger proteins. Here, we address this by introducing the software COMPASS, which, by combining automated resonance assignment with manual intervention, is able to achieve accuracy approaching that from manual assignments at greatly accelerated speeds. Moreover, by including the option to compensate for isotope shift effects in deuterated proteins, COMPASS is far more accurate for larger proteins than existing automated methods. COMPASS is an open-source project licensed under GNU General Public License and is available for download from http://www.liu.se/forskning/foass/tidigare-foass/patrik-lundstrom/software?l=en. Source code and binaries for Linux, Mac OS X and Microsoft Windows are available. PMID:25569628

  8. Network-based prediction of protein function

    PubMed Central

    Sharan, Roded; Ulitsky, Igor; Shamir, Ron

    2007-01-01

    Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community. PMID:17353930

  9. Computational Methods to Predict Protein Interaction Partners

    NASA Astrophysics Data System (ADS)

    Valencia, Alfonso; Pazos, Florencio

    In the new paradigm for studying biological phenomena represented by Systems Biology, cellular components are not considered in isolation but as forming complex networks of relationships. Protein interaction networks are among the first objects studied from this new point of view. Deciphering the interactome (the whole network of interactions for a given proteome) has been shown to be a very complex task. Computational techniques for detecting protein interactions have become standard tools for dealing with this problem, helping and complementing their experimental counterparts. Most of these techniques use genomic or sequence features intuitively related with protein interactions and are based on "first principles" in the sense that they do not involve training with examples. There are also other computational techniques that use other sources of information (i.e. structural information or even experimental data) or are based on training with examples.

  10. Accurate prediction of polarised high order electrostatic interactions for hydrogen bonded complexes using the machine learning method kriging.

    PubMed

    Hughes, Timothy J; Kandathil, Shaun M; Popelier, Paul L A

    2015-02-01

    As intermolecular interactions such as the hydrogen bond are electrostatic in origin, rigorous treatment of this term within force field methodologies should be mandatory. We present a method able of accurately reproducing such interactions for seven van der Waals complexes. It uses atomic multipole moments up to hexadecupole moment mapped to the positions of the nuclear coordinates by the machine learning method kriging. Models were built at three levels of theory: HF/6-31G(**), B3LYP/aug-cc-pVDZ and M06-2X/aug-cc-pVDZ. The quality of the kriging models was measured by their ability to predict the electrostatic interaction energy between atoms in external test examples for which the true energies are known. At all levels of theory, >90% of test cases for small van der Waals complexes were predicted within 1 kJ mol(-1), decreasing to 60-70% of test cases for larger base pair complexes. Models built on moments obtained at B3LYP and M06-2X level generally outperformed those at HF level. For all systems the individual interactions were predicted with a mean unsigned error of less than 1 kJ mol(-1). PMID:24274986

  11. DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields

    PubMed Central

    Wang, Sheng; Weng, Shunyan; Ma, Jianzhu; Tang, Qingming

    2015-01-01

    Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors. PMID:26230689

  12. PLIF: A rapid, accurate method to detect and quantitatively assess protein-lipid interactions.

    PubMed

    Ceccato, Laurie; Chicanne, Gaëtan; Nahoum, Virginie; Pons, Véronique; Payrastre, Bernard; Gaits-Iacovoni, Frédérique; Viaud, Julien

    2016-01-01

    Phosphoinositides are a type of cellular phospholipid that regulate signaling in a wide range of cellular and physiological processes through the interaction between their phosphorylated inositol head group and specific domains in various cytosolic proteins. These lipids also influence the activity of transmembrane proteins. Aberrant phosphoinositide signaling is associated with numerous diseases, including cancer, obesity, and diabetes. Thus, identifying phosphoinositide-binding partners and the aspects that define their specificity can direct drug development. However, current methods are costly, time-consuming, or technically challenging and inaccessible to many laboratories. We developed a method called PLIF (for "protein-lipid interaction by fluorescence") that uses fluorescently labeled liposomes and tethered, tagged proteins or peptides to enable fast and reliable determination of protein domain specificity for given phosphoinositides in a membrane environment. We validated PLIF against previously known phosphoinositide-binding partners for various proteins and obtained relative affinity profiles. Moreover, PLIF analysis of the sorting nexin (SNX) family revealed not only that SNXs bound most strongly to phosphatidylinositol 3-phosphate (PtdIns3P or PI3P), which is known from analysis with other methods, but also that they interacted with other phosphoinositides, which had not previously been detected using other techniques. Different phosphoinositide partners, even those with relatively weak binding affinity, could account for the diverse functions of SNXs in vesicular trafficking and protein sorting. Because PLIF is sensitive, semiquantitative, and performed in a high-throughput manner, it may be used to screen for highly specific protein-lipid interaction inhibitors. PMID:27025878

  13. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

    PubMed

    Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun

    2016-04-01

    Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods. PMID:26563228

  14. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  15. Protein side chain conformation predictions with an MMGBSA energy function.

    PubMed

    Gaillard, Thomas; Panel, Nicolas; Simonson, Thomas

    2016-06-01

    The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc. PMID:26948696

  16. AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction

    PubMed Central

    Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen; Godzik, Adam

    2015-01-01

    Motivation: Most proteins consist of multiple domains, independent structural and evolutionary units that are often reshuffled in genomic rearrangements to form new protein architectures. Template-based modeling methods can often detect homologous templates for individual domains, but templates that could be used to model the entire query protein are often not available. Results: We have developed a fast docking algorithm ab initio domain assembly (AIDA) for assembling multi-domain protein structures, guided by the ab initio folding potential. This approach can be extended to discontinuous domains (i.e. domains with ‘inserted’ domains). When tested on experimentally solved structures of multi-domain proteins, the relative domain positions were accurately found among top 5000 models in 86% of cases. AIDA server can use domain assignments provided by the user or predict them from the provided sequence. The latter approach is particularly useful for automated protein structure prediction servers. The blind test consisting of 95 CASP10 targets shows that domain boundaries could be successfully determined for 97% of targets. Availability and implementation: The AIDA package as well as the benchmark sets used here are available for download at http://ffas.burnham.org/AIDA/. Contact: adam@sanfordburnham.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25701568

  17. Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data

    PubMed Central

    Ahmad, Shandar; Mizuguchi, Kenji

    2011-01-01

    Computational prediction of residues that participate in protein-protein interactions is a difficult task, and state of the art methods have shown only limited success in this arena. One possible problem with these methods is that they try to predict interacting residues without incorporating information about the partner protein, although it is unclear how much partner information could enhance prediction performance. To address this issue, the two following comparisons are of crucial significance: (a) comparison between the predictability of inter-protein residue pairs, i.e., predicting exactly which residue pairs interact with each other given two protein sequences; this can be achieved by either combining conventional single-protein predictions or making predictions using a new model trained directly on the residue pairs, and the performance of these two approaches may be compared: (b) comparison between the predictability of the interacting residues in a single protein (irrespective of the partner residue or protein) from conventional methods and predictions converted from the pair-wise trained model. Using these two streams of training and validation procedures and employing similar two-stage neural networks, we showed that the models trained on pair-wise contacts outperformed the partner-unaware models in predicting both interacting pairs and interacting single-protein residues. Prediction performance decreased with the size of the conformational change upon complex formation; this trend is similar to docking, even though no structural information was used in our prediction. An example application that predicts two partner-specific interfaces of a protein was shown to be effective, highlighting the potential of the proposed approach. Finally, a preliminary attempt was made to score docking decoy poses using prediction of interacting residue pairs; this analysis produced an encouraging result. PMID:22194998

  18. A simple yet accurate correction for winner's curse can predict signals discovered in much larger genome scans

    PubMed Central

    Bigdeli, T. Bernard; Lee, Donghyung; Webb, Bradley Todd; Riley, Brien P.; Vladimirov, Vladimir I.; Fanous, Ayman H.; Kendler, Kenneth S.; Bacanu, Silviu-Alin

    2016-01-01

    Motivation: For genetic studies, statistically significant variants explain far less trait variance than ‘sub-threshold’ association signals. To dimension follow-up studies, researchers need to accurately estimate ‘true’ effect sizes at each SNP, e.g. the true mean of odds ratios (ORs)/regression coefficients (RRs) or Z-score noncentralities. Naïve estimates of effect sizes incur winner’s curse biases, which are reduced only by laborious winner’s curse adjustments (WCAs). Given that Z-scores estimates can be theoretically translated on other scales, we propose a simple method to compute WCA for Z-scores, i.e. their true means/noncentralities. Results:WCA of Z-scores shrinks these towards zero while, on P-value scale, multiple testing adjustment (MTA) shrinks P-values toward one, which corresponds to the zero Z-score value. Thus, WCA on Z-scores scale is a proxy for MTA on P-value scale. Therefore, to estimate Z-score noncentralities for all SNPs in genome scans, we propose FDR Inverse Quantile Transformation (FIQT). It (i) performs the simpler MTA of P-values using FDR and (ii) obtains noncentralities by back-transforming MTA P-values on Z-score scale. When compared to competitors, realistic simulations suggest that FIQT is more (i) accurate and (ii) computationally efficient by orders of magnitude. Practical application of FIQT to Psychiatric Genetic Consortium schizophrenia cohort predicts a non-trivial fraction of sub-threshold signals which become significant in much larger supersamples. Conclusions: FIQT is a simple, yet accurate, WCA method for Z-scores (and ORs/RRs, via simple transformations). Availability and Implementation: A 10 lines R function implementation is available at https://github.com/bacanusa/FIQT. Contact: sabacanu@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27187203

  19. Small-scale field experiments accurately scale up to predict density dependence in reef fish populations at large scales

    PubMed Central

    Steele, Mark A.; Forrester, Graham E.

    2005-01-01

    Field experiments provide rigorous tests of ecological hypotheses but are usually limited to small spatial scales. It is thus unclear whether these findings extrapolate to larger scales relevant to conservation and management. We show that the results of experiments detecting density-dependent mortality of reef fish on small habitat patches scale up to have similar effects on much larger entire reefs that are the size of small marine reserves and approach the scale at which some reef fisheries operate. We suggest that accurate scaling is due to the type of species interaction causing local density dependence and the fact that localized events can be aggregated to describe larger-scale interactions with minimal distortion. Careful extrapolation from small-scale experiments identifying species interactions and their effects should improve our ability to predict the outcomes of alternative management strategies for coral reef fishes and their habitats. PMID:16150721

  20. Small-scale field experiments accurately scale up to predict density dependence in reef fish populations at large scales.

    PubMed

    Steele, Mark A; Forrester, Graham E

    2005-09-20

    Field experiments provide rigorous tests of ecological hypotheses but are usually limited to small spatial scales. It is thus unclear whether these findings extrapolate to larger scales relevant to conservation and management. We show that the results of experiments detecting density-dependent mortality of reef fish on small habitat patches scale up to have similar effects on much larger entire reefs that are the size of small marine reserves and approach the scale at which some reef fisheries operate. We suggest that accurate scaling is due to the type of species interaction causing local density dependence and the fact that localized events can be aggregated to describe larger-scale interactions with minimal distortion. Careful extrapolation from small-scale experiments identifying species interactions and their effects should improve our ability to predict the outcomes of alternative management strategies for coral reef fishes and their habitats. PMID:16150721

  1. Effects of the inlet conditions and blood models on accurate prediction of hemodynamics in the stented coronary arteries

    NASA Astrophysics Data System (ADS)

    Jiang, Yongfei; Zhang, Jun; Zhao, Wanhua

    2015-05-01

    Hemodynamics altered by stent implantation is well-known to be closely related to in-stent restenosis. Computational fluid dynamics (CFD) method has been used to investigate the hemodynamics in stented arteries in detail and help to analyze the performances of stents. In this study, blood models with Newtonian or non-Newtonian properties were numerically investigated for the hemodynamics at steady or pulsatile inlet conditions respectively employing CFD based on the finite volume method. The results showed that the blood model with non-Newtonian property decreased the area of low wall shear stress (WSS) compared with the blood model with Newtonian property and the magnitude of WSS varied with the magnitude and waveform of the inlet velocity. The study indicates that the inlet conditions and blood models are all important for accurately predicting the hemodynamics. This will be beneficial to estimate the performances of stents and also help clinicians to select the proper stents for the patients.

  2. The general AMBER force field (GAFF) can accurately predict thermodynamic and transport properties of many ionic liquids.

    PubMed

    Sprenger, K G; Jaeger, Vance W; Pfaendtner, Jim

    2015-05-01

    We have applied molecular dynamics to calculate thermodynamic and transport properties of a set of 19 room-temperature ionic liquids. Since accurately simulating the thermophysical properties of solvents strongly depends upon the force field of choice, we tested the accuracy of the general AMBER force field, without refinement, for the case of ionic liquids. Electrostatic point charges were developed using ab initio calculations and a charge scaling factor of 0.8 to more accurately predict dynamic properties. The density, heat capacity, molar enthalpy of vaporization, self-diffusivity, and shear viscosity of the ionic liquids were computed and compared to experimentally available data, and good agreement across a wide range of cation and anion types was observed. Results show that, for a wide range of ionic liquids, the general AMBER force field, with no tuning of parameters, can reproduce a variety of thermodynamic and transport properties with similar accuracy to that of other published, often IL-specific, force fields. PMID:25853313

  3. Structure-based mutant stability predictions on proteins of unknown structure.

    PubMed

    Gonnelli, Giulia; Rooman, Marianne; Dehouck, Yves

    2012-10-31

    The ability to rapidly and accurately predict the effects of mutations on the physicochemical properties of proteins holds tremendous importance in the rational design of modified proteins for various types of industrial, environmental or pharmaceutical applications, as well as in elucidating the genetic background of complex diseases. In many cases, the absence of an experimentally resolved structure represents a major obstacle, since most currently available predictive software crucially depend on it. We investigate here the relevance of combining coarse-grained structure-based stability predictions with a simple comparative modeling procedure. Strikingly, our results show that the use of average to high quality structural models leads to virtually no loss in predictive power compared to the use of experimental structures. Even in the case of low quality models, the decrease in performance is quite limited and this combined approach remains markedly superior to other methods based exclusively on the analysis of sequence features. PMID:22782143

  4. How accurate are polymer models in the analysis of Förster resonance energy transfer experiments on proteins?

    NASA Astrophysics Data System (ADS)

    O'Brien, Edward P.; Morrison, Greg; Brooks, Bernard R.; Thirumalai, D.

    2009-03-01

    Single molecule Förster resonance energy transfer (FRET) experiments are used to infer the properties of the denatured state ensemble (DSE) of proteins. From the measured average FRET efficiency, ⟨E⟩, the distance distribution P(R ) is inferred by assuming that the DSE can be described as a polymer. The single parameter in the appropriate polymer model (Gaussian chain, wormlike chain, or self-avoiding walk) for P(R ) is determined by equating the calculated and measured ⟨E⟩. In order to assess the accuracy of this "standard procedure," we consider the generalized Rouse model (GRM), whose properties [⟨E⟩ and P(R )] can be analytically computed, and the Molecular Transfer Model for protein L for which accurate simulations can be carried out as a function of guanadinium hydrochloride (GdmCl) concentration. Using the precisely computed ⟨E⟩ for the GRM and protein L, we infer P(R ) using the standard procedure. We find that the mean end-to-end distance can be accurately inferred (less than 10% relative error) using ⟨E⟩ and polymer models for P(R ). However, the value extracted for the radius of gyration (Rg) and the persistence length (lp) are less accurate. For protein L, the errors in the inferred properties increase as the GdmCl concentration increases for all polymer models. The relative error in the inferred Rg and lp, with respect to the exact values, can be as large as 25% at the highest GdmCl concentration. We propose a self-consistency test, requiring measurements of ⟨E⟩ by attaching dyes to different residues in the protein, to assess the validity of describing DSE using the Gaussian model. Application of the self-consistency test to the GRM shows that even for this simple model, which exhibits an order→disorder transition, the Gaussian P(R ) is inadequate. Analysis of experimental data of FRET efficiencies with dyes at several locations for the cold shock protein, and simulations results for protein L, for which accurate FRET

  5. Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling

    PubMed Central

    Lees, Jon G.; Reid, Adam J.; Yeats, Corin; Clegg, Andrew B.; Sanchez-Jimenez, Francisca; Orengo, Christine

    2010-01-01

    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or “dark matter” of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case

  6. Finding the "dark matter" in human and yeast protein network prediction and modelling.

    PubMed

    Ranea, Juan A G; Morilla, Ian; Lees, Jon G; Reid, Adam J; Yeats, Corin; Clegg, Andrew B; Sanchez-Jimenez, Francisca; Orengo, Christine

    2010-01-01

    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter" of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case

  7. A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach.

    PubMed

    Estrada, T; Zhang, B; Cicotti, P; Armen, R S; Taufer, M

    2012-07-01

    We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking. PMID:22658682

  8. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  9. Accurate protein-peptide titration experiments by nuclear magnetic resonance using low-volume samples.

    PubMed

    Köhler, Christian; Recht, Raphaël; Quinternet, Marc; de Lamotte, Frederic; Delsuc, Marc-André; Kieffer, Bruno

    2015-01-01

    NMR spectroscopy allows measurements of very accurate values of equilibrium dissociation constants using chemical shift perturbation methods, provided that the concentrations of the binding partners are known with high precision and accuracy. The accuracy and precision of these experiments are improved if performed using individual capillary tubes, a method enabling full automation of the measurement. We provide here a protocol to set up and perform these experiments as well as a robust method to measure peptide concentrations using tryptophan as an internal standard. PMID:25749962

  10. DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins.

    PubMed

    Tan, Kuan Pern; Varadarajan, Raghavan; Madhusudhan, M S

    2011-07-01

    Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight <1.5  KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of ∼0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery. PMID:21576233

  11. Using Bacteria to Determine Protein Kinase Specificity and Predict Target Substrates

    PubMed Central

    Lubner, Joshua M.; Church, George M.; Husson, Robert N.; Schwartz, Daniel

    2012-01-01

    The identification of protein kinase targets remains a significant bottleneck for our understanding of signal transduction in normal and diseased cellular states. Kinases recognize their substrates in part through sequence motifs on substrate proteins, which, to date, have most effectively been elucidated using combinatorial peptide library approaches. Here, we present and demonstrate the ProPeL method for easy and accurate discovery of kinase specificity motifs through the use of native bacterial proteomes that serve as in vivo libraries for thousands of simultaneous phosphorylation reactions. Using recombinant kinases expressed in E. coli followed by mass spectrometry, the approach accurately recapitulated the well-established motif preferences of human basophilic (Protein Kinase A) and acidophilic (Casein Kinase II) kinases. These motifs, derived for PKA and CK II using only bacterial sequence data, were then further validated by utilizing them in conjunction with the scan-x software program to computationally predict known human phosphorylation sites with high confidence. PMID:23300758

  12. Computational Prediction of RNA-Binding Proteins and Binding Sites

    PubMed Central

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions. PMID:26540053

  13. Accurate design of megadalton-scale two-component icosahedral protein complexes.

    PubMed

    Bale, Jacob B; Gonen, Shane; Liu, Yuxi; Sheffler, William; Ellis, Daniel; Thomas, Chantz; Cascio, Duilio; Yeates, Todd O; Gonen, Tamir; King, Neil P; Baker, David

    2016-07-22

    Nature provides many examples of self- and co-assembling protein-based molecular machines, including icosahedral protein cages that serve as scaffolds, enzymes, and compartments for essential biochemical reactions and icosahedral virus capsids, which encapsidate and protect viral genomes and mediate entry into host cells. Inspired by these natural materials, we report the computational design and experimental characterization of co-assembling, two-component, 120-subunit icosahedral protein nanostructures with molecular weights (1.8 to 2.8 megadaltons) and dimensions (24 to 40 nanometers in diameter) comparable to those of small viral capsids. Electron microscopy, small-angle x-ray scattering, and x-ray crystallography show that 10 designs spanning three distinct icosahedral architectures form materials closely matching the design models. In vitro assembly of icosahedral complexes from independently purified components occurs rapidly, at rates comparable to those of viral capsids, and enables controlled packaging of molecular cargo through charge complementarity. The ability to design megadalton-scale materials with atomic-level accuracy and controllable assembly opens the door to a new generation of genetically programmable protein-based molecular machines. PMID:27463675

  14. A Prediction Model for Membrane Proteins Using Moments Based Features.

    PubMed

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  15. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  16. A structural alphabet for local protein structures: improved prediction methods.

    PubMed

    Etchebest, Catherine; Benros, Cristina; Hazout, Serge; de Brevern, Alexandre G

    2005-06-01

    Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%. PMID:15822101

  17. Predicting suitable optoelectronic properties of monoclinic VON semiconductor crystals for photovoltaics using accurate first-principles computations.

    PubMed

    Harb, Moussab

    2015-10-14

    Using accurate first-principles quantum calculations based on DFT (including the DFPT) with the range-separated hybrid HSE06 exchange-correlation functional, we can predict the essential fundamental properties (such as bandgap, optical absorption co-efficient, dielectric constant, charge carrier effective masses and exciton binding energy) of two stable monoclinic vanadium oxynitride (VON) semiconductor crystals for solar energy conversion applications. In addition to the predicted band gaps in the optimal range for making single-junction solar cells, both polymorphs exhibit a relatively high absorption efficiency in the visible range, high dielectric constant, high charge carrier mobility and much lower exciton binding energy than the thermal energy at room temperature. Moreover, their optical absorption, dielectric and exciton dissociation properties were found to be better than those obtained for semiconductors frequently utilized in photovoltaic devices such as Si, CdTe and GaAs. These novel results offer a great opportunity for this stoichiometric VON material to be properly synthesized and considered as a new good candidate for photovoltaic applications. PMID:26351755

  18. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed.

  19. Accurate electrical prediction of memory array through SEM-based edge-contour extraction using SPICE simulation

    NASA Astrophysics Data System (ADS)

    Shauly, Eitan; Rotstein, Israel; Peltinov, Ram; Latinski, Sergei; Adan, Ofer; Levi, Shimon; Menadeva, Ovadya

    2009-03-01

    The continues transistors scaling efforts, for smaller devices, similar (or larger) drive current/um and faster devices, increase the challenge to predict and to control the transistor off-state current. Typically, electrical simulators like SPICE, are using the design intent (as-drawn GDS data). At more sophisticated cases, the simulators are fed with the pattern after lithography and etch process simulations. As the importance of electrical simulation accuracy is increasing and leakage is becoming more dominant, there is a need to feed these simulators, with more accurate information extracted from physical on-silicon transistors. Our methodology to predict changes in device performances due to systematic lithography and etch effects was used in this paper. In general, the methodology consists on using the OPCCmaxTM for systematic Edge-Contour-Extraction (ECE) from transistors, taking along the manufacturing and includes any image distortions like line-end shortening, corner rounding and line-edge roughness. These measurements are used for SPICE modeling. Possible application of this new metrology is to provide a-head of time, physical and electrical statistical data improving time to market. In this work, we applied our methodology to analyze a small and large array's of 2.14um2 6T-SRAM, manufactured using Tower Standard Logic for General Purposes Platform. 4 out of the 6 transistors used "U-Shape AA", known to have higher variability. The predicted electrical performances of the transistors drive current and leakage current, in terms of nominal values and variability are presented. We also used the methodology to analyze an entire SRAM Block array. Study of an isolation leakage and variability are presented.

  20. Measurements of accurate x-ray scattering data of protein solutions using small stationary sample cells

    SciTech Connect

    Hong Xinguo; Hao Quan

    2009-01-15

    In this paper, we report a method of precise in situ x-ray scattering measurements on protein solutions using small stationary sample cells. Although reduction in the radiation damage induced by intense synchrotron radiation sources is indispensable for the correct interpretation of scattering data, there is still a lack of effective methods to overcome radiation-induced aggregation and extract scattering profiles free from chemical or structural damage. It is found that radiation-induced aggregation mainly begins on the surface of the sample cell and grows along the beam path; the diameter of the damaged region is comparable to the x-ray beam size. Radiation-induced aggregation can be effectively avoided by using a two-dimensional scan (2D mode), with an interval as small as 1.5 times the beam size, at low temperature (e.g., 4 deg. C). A radiation sensitive protein, bovine hemoglobin, was used to test the method. A standard deviation of less than 5% in the small angle region was observed from a series of nine spectra recorded in 2D mode, in contrast to the intensity variation seen using the conventional stationary technique, which can exceed 100%. Wide-angle x-ray scattering data were collected at a standard macromolecular diffraction station using the same data collection protocol and showed a good signal/noise ratio (better than the reported data on the same protein using a flow cell). The results indicate that this method is an effective approach for obtaining precise measurements of protein solution scattering.

  1. Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models.

    PubMed

    Suárez, Ernesto; Adelman, Joshua L; Zuckerman, Daniel M

    2016-08-01

    Because standard molecular dynamics (MD) simulations are unable to access time scales of interest in complex biomolecular systems, it is common to "stitch together" information from multiple shorter trajectories using approximate Markov state model (MSM) analysis. However, MSMs may require significant tuning and can yield biased results. Here, by analyzing some of the longest protein MD data sets available (>100 μs per protein), we show that estimators constructed based on exact non-Markovian (NM) principles can yield significantly improved mean first-passage times (MFPTs) for protein folding and unfolding. In some cases, MSM bias of more than an order of magnitude can be corrected when identical trajectory data are reanalyzed by non-Markovian approaches. The NM analysis includes "history" information, higher order time correlations compared to MSMs, that is available in every MD trajectory. The NM strategy is insensitive to fine details of the states used and works well when a fine time-discretization (i.e., small "lag time") is used. PMID:27340835

  2. Truly Absorbed Microbial Protein Synthesis, Rumen Bypass Protein, Endogenous Protein, and Total Metabolizable Protein from Starchy and Protein-Rich Raw Materials: Model Comparison and Predictions.

    PubMed

    Parand, Ehsan; Vakili, Alireza; Mesgaran, Mohsen Danesh; van Duinkerken, Gert; Yu, Peiqiang

    2015-07-29

    This study was carried out to measure truly absorbed microbial protein synthesis, rumen bypass protein, and endogenous protein loss, as well as total metabolizable protein, from starchy and protein-rich raw feed materials with model comparisons. Predictions by the DVE2010 system as a more mechanistic model were compared with those of two other models, DVE1994 and NRC-2001, that are frequently used in common international feeding practice. DVE1994 predictions for intestinally digestible rumen undegradable protein (ARUP) for starchy concentrates were higher (27 vs 18 g/kg DM, p < 0.05, SEM = 1.2) than predictions by the NRC-2001, whereas there was no difference in predictions for ARUP from protein concentrates among the three models. DVE2010 and NRC-2001 had highest estimations of intestinally digestible microbial protein for starchy (92 g/kg DM in DVE2010 vs 46 g/kg DM in NRC-2001 and 67 g/kg DM in DVE1994, p < 0.05 SEM = 4) and protein concentrates (69 g/kg DM in NRC-2001 vs 31 g/kg DM in DVE1994 and 49 g/kg DM in DVE2010, p < 0.05 SEM = 4), respectively. Potential protein supplies predicted by tested models from starchy and protein concentrates are widely different, and comparable direct measurements are needed to evaluate the actual ability of different models to predict the potential protein supply to dairy cows from different feedstuffs. PMID:26118653

  3. On the Encoding of Proteins for Disordered Regions Prediction

    PubMed Central

    Becker, Julien; Maes, Francis; Wehenkel, Louis

    2013-01-01

    Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder. PMID:24358161

  4. On the encoding of proteins for disordered regions prediction.

    PubMed

    Becker, Julien; Maes, Francis; Wehenkel, Louis

    2013-01-01

    Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder. PMID:24358161

  5. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method

    PubMed Central

    Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao

    2015-01-01

    Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction. PMID:26587542

  6. A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins

    PubMed Central

    Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

    2013-01-01

    The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595

  7. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-01

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html. PMID:22370884

  8. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    PubMed

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning. PMID:24578640

  9. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  10. WeFold: a coopetition for protein structure prediction.

    PubMed

    Khoury, George A; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O; Faccioli, Rodrigo A; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A; Sieradzan, Adam K; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C B; Floudas, Christodoulos A; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A; Skolnick, Jeffrey; Crivelli, Silvia N

    2014-09-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  11. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges.

    PubMed

    Sonah, Humira; Deshmukh, Rupesh K; Bélanger, Richard R

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  12. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    PubMed Central

    Sonah, Humira; Deshmukh, Rupesh K.; Bélanger, Richard R.

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant–pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  13. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

    PubMed Central

    Brezovský, Jan

    2016-01-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations

  14. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

    PubMed

    Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

    2016-05-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To

  15. Predicting Protein-Protein Interactions from the Molecular to the Proteome Level.

    PubMed

    Keskin, Ozlem; Tuncbag, Nurcan; Gursoy, Attila

    2016-04-27

    Identification of protein-protein interactions (PPIs) is at the center of molecular biology considering the unquestionable role of proteins in cells. Combinatorial interactions result in a repertoire of multiple functions; hence, knowledge of PPI and binding regions naturally serve to functional proteomics and drug discovery. Given experimental limitations to find all interactions in a proteome, computational prediction/modeling of protein interactions is a prerequisite to proceed on the way to complete interactions at the proteome level. This review aims to provide a background on PPIs and their types. Computational methods for PPI predictions can use a variety of biological data including sequence-, evolution-, expression-, and structure-based data. Physical and statistical modeling are commonly used to integrate these data and infer PPI predictions. We review and list the state-of-the-art methods, servers, databases, and tools for protein-protein interaction prediction. PMID:27074302

  16. A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons

    PubMed Central

    Tian, Tian; Salis, Howard M.

    2015-01-01

    Natural and engineered genetic systems require the coordinated expression of proteins. In bacteria, translational coupling provides a genetically encoded mechanism to control expression level ratios within multi-cistronic operons. We have developed a sequence-to-function biophysical model of translational coupling to predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios. To quantitatively measure ribosome re-initiation rates, we designed and characterized 22 bi-cistronic operon variants with systematically modified intergenic distances and upstream translation rates. We then derived a thermodynamic free energy model to calculate de novo initiation rates as a result of ribosome-assisted unfolding of intergenic RNA structures. The complete biophysical model has only five free parameters, but was able to accurately predict downstream translation rates for 120 synthetic bi-cistronic and tri-cistronic operons with rationally designed intergenic regions and systematically increased upstream translation rates. The biophysical model also accurately predicted the translation rates of the nine protein atp operon, compared to ribosome profiling measurements. Altogether, the biophysical model quantitatively predicts how translational coupling controls protein expression levels in synthetic and natural bacterial operons, providing a deeper understanding of an important post-transcriptional regulatory mechanism and offering the ability to rationally engineer operons with desired behaviors. PMID:26117546

  17. Accurate determination of the diffusion coefficient of proteins by Fourier analysis with whole column imaging detection.

    PubMed

    Zarabadi, Atefeh S; Pawliszyn, Janusz

    2015-02-17

    Analysis in the frequency domain is considered a powerful tool to elicit precise information from spectroscopic signals. In this study, the Fourier transformation technique is employed to determine the diffusion coefficient (D) of a number of proteins in the frequency domain. Analytical approaches are investigated for determination of D from both experimental and data treatment viewpoints. The diffusion process is modeled to calculate diffusion coefficients based on the Fourier transformation solution to Fick's law equation, and its results are compared to time domain results. The simulations characterize optimum spatial and temporal conditions and demonstrate the noise tolerance of the method. The proposed model is validated by its application for the electropherograms from the diffusion path of a set of proteins. Real-time dynamic scanning is conducted to monitor dispersion by employing whole column imaging detection technology in combination with capillary isoelectric focusing (CIEF) and the imaging plug flow (iPF) experiment. These experimental techniques provide different peak shapes, which are utilized to demonstrate the Fourier transformation ability in extracting diffusion coefficients out of irregular shape signals. Experimental results confirmed that the Fourier transformation procedure substantially enhanced the accuracy of the determined values compared to those obtained in the time domain. PMID:25607375

  18. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance.

    PubMed

    Majaj, Najib J; Hong, Ha; Solomon, Ethan A; DiCarlo, James J

    2015-09-30

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior. PMID:26424887

  19. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance

    PubMed Central

    Hong, Ha; Solomon, Ethan A.; DiCarlo, James J.

    2015-01-01

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior. PMID:26424887

  20. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile.

    PubMed

    Ding, Shuyan; Li, Yan; Shi, Zhuoxing; Yan, Shoujiang

    2014-02-01

    Knowledge of protein secondary structural classes plays an important role in understanding protein folding patterns. In this paper, 25 features based on position-specific scoring matrices are selected to reflect evolutionary information. In combination with other 11 rational features based on predicted protein secondary structure sequences proposed by the previous researchers, a 36-dimensional representation feature vector is presented to predict protein secondary structural classes for low-similarity sequences. ASTRALtraining dataset is used to train and design our method, other three low-similarity datasets ASTRALtest, 25PDB and 1189 are used to test the proposed method. Comparisons with other methods show that our method is effective to predict protein secondary structural classes. Stand alone version of the proposed method (PSSS-PSSM) is written in MATLAB language and it can be downloaded from http://letsgob.com/bioinfo_PSSS_PSSM/. PMID:24067326

  1. Improving protein secondary structure prediction using a multi-modal BP method.

    PubMed

    Qu, Wu; Sui, Haifeng; Yang, Bingru; Qian, Wenbin

    2011-10-01

    Methods for predicting protein secondary structures provide information that is useful both in ab initio structure prediction and as additional restraints for fold recognition algorithms. Secondary structure predictions may also be used to guide the design of site directed mutagenesis studies, and to locate potential functionally important residues. In this article, we propose a multi-modal back propagation neural network (MMBP) method for predicting protein secondary structures. Using a Knowledge Discovery Theory based on Inner Cognitive Mechanism (KDTICM) method, we have constructed a compound pyramid model (CPM), which is composed of three layers of intelligent interface that integrate multi-modal back propagation neural network (MMBP), mixed-modal SVM (MMS), modified Knowledge Discovery in Databases (KDD(⁎)) process and so on. The CPM method is both an integrated web server and a standalone application that exploits recent advancements in knowledge discovery and machine learning to perform very accurate protein secondary structure predictions. Using a non-redundant test dataset of 256 proteins from RCASP256, the CPM method achieves an average Q(3) score of 86.13% (SOV99=84.66%). Extensive testing indicates that this is significantly better than any other method currently available. Assessments using RS126 and CB513 datasets indicate that the CPM method can achieve average Q(3) score approaching 83.99% (SOV99=80.25%) and 85.58% (SOV99=81.15%). By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called CPM, which performs these secondary structure predictions, is accessible at http://kdd.ustb.edu.cn/protein_Web/. PMID:21880310

  2. Mathematical models for accurate prediction of atmospheric visibility with particular reference to the seasonal and environmental patterns in Hong Kong.

    PubMed

    Mui, K W; Wong, L T; Chung, L Y

    2009-11-01

    Atmospheric visibility impairment has gained increasing concern as it is associated with the existence of a number of aerosols as well as common air pollutants and produces unfavorable conditions for observation, dispersion, and transportation. This study analyzed the atmospheric visibility data measured in urban and suburban Hong Kong (two selected stations) with respect to time-matched mass concentrations of common air pollutants including nitrogen dioxide (NO(2)), nitrogen monoxide (NO), respirable suspended particulates (PM(10)), sulfur dioxide (SO(2)), carbon monoxide (CO), and meteorological parameters including air temperature, relative humidity, and wind speed. No significant difference in atmospheric visibility was reported between the two measurement locations (p > or = 0.6, t test); and good atmospheric visibility was observed more frequently in summer and autumn than in winter and spring (p < 0.01, t test). It was also found that atmospheric visibility increased with temperature but decreased with the concentrations of SO(2), CO, PM(10), NO, and NO(2). The results showed that atmospheric visibility was season dependent and would have significant correlations with temperature, the mass concentrations of PM(10) and NO(2), and the air pollution index API (correlation coefficients mid R: R mid R: > or = 0.7, p < or = 0.0001, t test). Mathematical expressions catering to the seasonal variations of atmospheric visibility were thus proposed. By comparison, the proposed visibility prediction models were more accurate than some existing regional models. In addition to improving visibility prediction accuracy, this study would be useful for understanding the context of low atmospheric visibility, exploring possible remedial measures, and evaluating the impact of air pollution and atmospheric visibility impairment in this region. PMID:18951139

  3. Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems

    SciTech Connect

    Samudrala, Ram; Heffron, Fred; McDermott, Jason E.

    2009-04-24

    The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates, effector proteins, are not. We have used a machine learning approach to identify new secreted effectors. The method integrates evolutionary measures, such as the pattern of homologs in a range of other organisms, and sequence-based features, such as G+C content, amino acid composition and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from Salmonella typhimurium and validated on a corresponding set of effectors from Pseudomonas syringae, after eliminating effectors with detectable sequence similarity. The method was able to identify all of the known effectors in P. syringae with a specificity of 84% and sensitivity of 82%. The reciprocal validation, training on P. syringae and validating on S. typhimurium, gave similar results with a specificity of 86% when the sensitivity level was 87%. These results show that type III effectors in disparate organisms share common features. We found that maximal performance is attained by including an N-terminal sequence of only 30 residues, which agrees with previous studies indicating that this region contains the secretion signal. We then used the method to define the most important residues in this putative secretion signal. Finally, we present novel predictions of secreted effectors in S. typhimurium, some of which have been experimentally validated, and apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis. This approach is a novel and effective way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.

  4. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  5. Efficient Prediction of Co-Complexed Proteins Based on Coevolution

    PubMed Central

    de Vienne, Damien M.; Azé, Jérôme

    2012-01-01

    The prediction of the network of protein-protein interactions (PPI) of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure. We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method. A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli. PMID:23152796

  6. Exploiting protein flexibility to predict the location of allosteric sites

    PubMed Central

    2012-01-01

    Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity), by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing that this simple coarse

  7. Accurate and reproducible detection of proteins in water using an extended-gate type organic transistor biosensor

    NASA Astrophysics Data System (ADS)

    Minamiki, Tsukuru; Minami, Tsuyoshi; Kurita, Ryoji; Niwa, Osamu; Wakida, Shin-ichi; Fukuda, Kenjiro; Kumaki, Daisuke; Tokito, Shizuo

    2014-06-01

    In this Letter, we describe an accurate antibody detection method using a fabricated extended-gate type organic field-effect-transistor (OFET), which can be operated at below 3 V. The protein-sensing portion of the designed device is the gate electrode functionalized with streptavidin. Streptavidin possesses high molecular recognition ability for biotin, which specifically allows for the detection of biotinylated proteins. Here, we attempted to detect biotinylated immunoglobulin G (IgG) and observed a shift of threshold voltage of the OFET upon the addition of the antibody in an aqueous solution with a competing bovine serum albumin interferent. The detection limit for the biotinylated IgG was 8 nM, which indicates the potential utility of the designed device in healthcare applications.

  8. A Support Vector Machine model for the prediction of proteotypic peptides for accurate mass and time proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Cannon, William R.; Oehmen, Christopher S.; Shah, Anuj R.; Gurumoorthi, Vidhya; Lipton, Mary S.; Waters, Katrina M.

    2008-07-01

    Motivation: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares these profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). Results: We present a Support Vector Machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity, and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of ~0.8 with a standard deviation of less than 0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. Availability: http://omics.pnl.gov/software/STEPP.php

  9. Sequence-based prediction of protein-peptide binding sites using support vector machine.

    PubMed

    Taherzadeh, Ghazaleh; Yang, Yuedong; Zhang, Tuo; Liew, Alan Wee-Chung; Zhou, Yaoqi

    2016-05-15

    Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. © 2016 Wiley Periodicals, Inc. PMID:26833816

  10. Prediction of protein folding rates from simplified secondary structure alphabet.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-10-21

    Protein folding is a very complicated and highly cooperative dynamic process. However, the folding kinetics is likely to depend more on a few key structural features. Here we find that secondary structures can determine folding rates of only large, multi-state folding proteins and fails to predict those for small, two-state proteins. The importance of secondary structures for protein folding is ordered as: extended β strand > α helix > bend > turn > undefined secondary structure>310 helix > isolated β strand > π helix. Only the first three secondary structures, extended β strand, α helix and bend, can achieve a good correlation with folding rates. This suggests that the rate-limiting step of protein folding would depend upon the formation of regular secondary structures and the buckling of chain. The reduced secondary structure alphabet provides a simplified description for the machine learning applications in protein design. PMID:26247139

  11. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    DOE PAGESBeta

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.« less

  12. SAM-T08, HMM-based protein structure prediction

    PubMed Central

    Karplus, Kevin

    2009-01-01

    The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PMID:19483096

  13. Domain-mediated protein interaction prediction: From genome to network.

    PubMed

    Reimand, Jüri; Hui, Shirley; Jain, Shobhit; Law, Brian; Bader, Gary D

    2012-08-14

    Protein-protein interactions (PPIs), involved in many biological processes such as cellular signaling, are ultimately encoded in the genome. Solving the problem of predicting protein interactions from the genome sequence will lead to increased understanding of complex networks, evolution and human disease. We can learn the relationship between genomes and networks by focusing on an easily approachable subset of high-resolution protein interactions that are mediated by peptide recognition modules (PRMs) such as PDZ, WW and SH3 domains. This review focuses on computational prediction and analysis of PRM-mediated networks and discusses sequence- and structure-based interaction predictors, techniques and datasets for identifying physiologically relevant PPIs, and interpreting high-resolution interaction networks in the context of evolution and human disease. PMID:22561014

  14. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    SciTech Connect

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.; Gonzalez, Rachel M.; Varnum, Susan M.; Zangar, Richard C.

    2008-07-14

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensity that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting

  15. Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging

    PubMed Central

    Fernandes, Armando; Vinga, Susana

    2016-01-01

    The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. PMID:26934190

  16. PIPs: human protein–protein interaction prediction database

    PubMed Central

    McDowall, Mark D.; Scott, Michelle S.; Barton, Geoffrey J.

    2009-01-01

    The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling. PMID:18988626

  17. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    PubMed Central

    Green, James R; Korenberg, Michael J; Aboul-Magd, Mohammed O

    2009-01-01

    Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting

  18. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures

    PubMed Central

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/. PMID:26982818

  19. Enhanced prediction of conformational flexibility and phosphorylation in proteins.

    PubMed

    Swaminathan, Karthikeyan; Adamczak, Rafal; Porollo, Aleksey; Meller, Jarosław

    2010-01-01

    Many sequence-based predictors of structural and functional properties of proteins have been developed in the past. In this study, we developed new methods for predicting measures of conformational flexibility in proteins, including X-ray structure-derived temperature (B-) factors and the variance within NMR structural ensemble, as effectively measured by the solvent accessibility standard deviations (SASDs). We further tested whether these predicted measures of conformational flexibility in crystal lattices and solution, respectively, can be used to improve the prediction of phosphorylation in proteins. The latter is an example of a common post-translational modification that modulates protein function, e.g., by affecting interactions and conformational flexibility of phosphorylated sites. Using robust epsilon-insensitive support vector regression (ε-SVR) models, we assessed two specific representations of protein sequences: one based on the position-specific scoring matrices (PSSMs) derived from multiple sequence alignments, and an augmented representation that incorporates real-valued solvent accessibility and secondary structure predictions (RSA/SS) as additional measures of local structural propensities. We showed that a combination of PSSMs and real-valued SS/RSA predictions provides systematic improvements in the accuracy of both B-factors and SASD prediction. These intermediate predictions were subsequently combined into an enhanced predictor of phosphorylation that was shown to significantly outperform methods based on PSSM alone. We would like to stress that to the best of our knowledge, this is the first example of using predicted from sequence NMR structure-based measures of conformational flexibility in solution for the prediction of other properties of proteins. Phosphorylation prediction methods typically employ a two-class classification approach with the limitation that the set of negative examples used for training may include some sites that are

  20. Addressing the Role of Conformational Diversity in Protein Structure Prediction

    PubMed Central

    Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  1. Addressing the Role of Conformational Diversity in Protein Structure Prediction.

    PubMed

    Palopoli, Nicolas; Monzon, Alexander Miguel; Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  2. Identification, Analysis and Prediction of Protein Ubiquitination Sites

    PubMed Central

    Radivojac, Predrag; Vacic, Vladimir; Haynes, Chad; Cocklin, Ross R.; Mohan, Amrita; Heyen, Joshua W.; Goebl, Mark G.; Iakoucheva, Lilia M.

    2009-01-01

    Summary Ubiquitination plays an important role in many cellular processes and is implicated in many diseases. Experimental identification of ubiquitination sites is challenging due to rapid turnover of ubiquitinated proteins and the large size of the ubiquitin modifier. We identified 141 new ubiquitination sites using a combination of liquid chromatography, mass spectrometry and mutant yeast strains. Investigation of the sequence biases and structural preferences around known ubiquitination sites indicated that their properties were similar to those of intrinsically disordered protein regions. Using a combined set of new and previously known ubiquitination sites, we developed a random forest predictor of ubiquitination sites, UbPred. The class-balanced accuracy of UbPred reached 72%, with the area under the ROC curve at 80%. The application of UbPred showed that high confidence Rsp5 ubiquitin ligase substrates and proteins with very short half-lives were significantly enriched in the number of predicted ubiquitination sites. Proteome-wide prediction of ubiquitination sites in Saccharomyces cerevisiae indicated that highly ubiquitinated substrates were prevalent among transcription/enzyme regulators and proteins involved in cell cycle control. In the human proteome, cytoskeletal, cell cycle, regulatory and cancer-associated proteins display higher extent of ubiquitination than proteins from other functional categories. We show that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease-associated mutations. UbPred is available at http://www.ubpred.org PMID:19722269

  3. Predicting Protein Function via Semantic Integration of Multiple Networks.

    PubMed

    Yu, Guoxian; Fu, Guangyuan; Wang, Jun; Zhu, Hailong

    2016-01-01

    Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically i ntegrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet. PMID:26800544

  4. DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

    PubMed Central

    Iqbal, Sumaiya; Hoque, Md Tamjidul

    2015-01-01

    Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0. PMID:26517719

  5. A Systematic Review of Predictions of Survival in Palliative Care: How Accurate Are Clinicians and Who Are the Experts?

    PubMed Central

    Harris, Adam; Harries, Priscilla

    2016-01-01

    overall accuracy being reported. Data were extracted using a standardised tool, by one reviewer, which could have introduced bias. Devising search terms for prognostic studies is challenging. Every attempt was made to devise search terms that were sufficiently sensitive to detect all prognostic studies; however, it remains possible that some studies were not identified. Conclusion Studies of prognostic accuracy in palliative care are heterogeneous, but the evidence suggests that clinicians’ predictions are frequently inaccurate. No sub-group of clinicians was consistently shown to be more accurate than any other. Implications of Key Findings Further research is needed to understand how clinical predictions are formulated and how their accuracy can be improved. PMID:27560380

  6. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-01-01

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems. PMID:26694353

  7. Mimicking the folding pathway to improve homology-free protein structure prediction

    NASA Astrophysics Data System (ADS)

    Freed, Karl; Debartolo, Joe; Colubri, Andres; Jha, Abhishek; Fitzgerald, James; Sosnick, Tobin

    2010-03-01

    Since demonstrating that a protein's sequence encodes its structure, the prediction of structure from sequence remains an outstanding problem that impacts numerous scientific disciplines including many genome projects. By iteratively fixing secondary structure assignments of residues during Monte Carlo simulations of folding, our coarse grained model without information concerning homology or explicit side chains outperforms current homology-based secondary structure prediction methods for many proteins. The computationally rapid algorithm using only single residue (phi, psi) dihedral angle moves also generates tertiary structures of comparable accuracy to existing all-atom methods for many small proteins, particularly ones with low homology. Hence, given appropriate search strategies and scoring functions, reduced representations can be used for accurately predicting secondary structure as well as providing three-dimensional structures, thereby increasing the size of proteins approachable by homology-free methods and the accuracy of template methods whose accuracy depends on the quality of the input secondary structure. Inclusion of information from evolutionarily related sequences enhances the statistics and the accuracy of the predictions.

  8. Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction

    PubMed Central

    Park, Hyeon Ah; Kim, Taewook; Li, Meijing; Shon, Ho Sun; Park, Jeong Seok; Ryu, Keun Ho

    2015-01-01

    Objectives Predicting protein function from the protein–protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network. Methods In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence—including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps. Results The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach. Conclusion The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain. PMID:25938021

  9. CoinFold: a web server for protein contact prediction and contact-assisted protein folding.

    PubMed

    Wang, Sheng; Li, Wei; Zhang, Renyu; Liu, Shiwang; Xu, Jinbo

    2016-07-01

    CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction. PMID:27112569

  10. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    PubMed

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction. PMID:23264567

  11. Revisiting the prediction of protein function at CASP6.

    PubMed

    Pellegrini-Calace, Marialuisa; Soro, Simonetta; Tramontano, Anna

    2006-07-01

    The ability to predict the function of a protein, given its sequence and/or 3D structure, is an essential requirement for exploiting the wealth of data made available by genomics and structural genomics projects and is therefore raising increasing interest in the computational biology community. To foster developments in the area as well as to establish the state of the art of present methods, a function prediction category was tentatively introduced in the 6th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) worldwide experiment. The assessment of the performance of the methods was made difficult by at least two factors: (a) the experimentally determined function of the targets was not available at the time of assessment; (b) the experiment is run blindly, preventing verification of whether the convergence of different predictions towards the same functional annotation was due to the similarity of the methods or to a genuine signal detectable by different methodologies. In this work, we collected information about the methods used by the various predictors and revisited the results of the experiment by verifying how often and in which cases a convergent prediction was obtained by methods based on different rationale. We propose a method for classifying the type and redundancy of the methods. We also analyzed the cases in which a function for the target protein has become available. Our results show that predictions derived from a consensus of different methods can reach an accuracy as high as 80%. It follows that some of the predictions submitted to CASP6, once reanalyzed taking into account the type of converging methods, can provide very useful information to researchers interested in the function of the target proteins. PMID:16759228

  12. Prediction of beta-turns in proteins using neural networks.

    PubMed

    McGregor, M J; Flores, T P; Sternberg, M J

    1989-05-01

    The use of neural networks to improve empirical secondary structure prediction is explored with regard to the identification of the position and conformational class of beta-turns, a four-residue chain reversal. Recently an algorithm was developed for beta-turn predictions based on the empirical approach of Chou and Fasman using different parameters for three classes (I, II and non-specific) of beta-turns. In this paper, using the same data, an alternative approach to derive an empirical prediction method is used based on neural networks which is a general learning algorithm extensively used in artificial intelligence. Thus the results of the two approaches can be compared. The most severe test of prediction accuracy is the percentage of turn predictions that are correct and the neural network gives an overall improvement from 20.6% to 26.0%. The proportion of correctly predicted residues is 71%, compared to a chance level of about 58%. Thus neural networks provide a method of obtaining more accurate predictions from empirical data than a simpler method of deriving propensities. PMID:2748568

  13. Sequence-based feature prediction and annotation of proteins

    PubMed Central

    Juncker, Agnieszka S; Jensen, Lars J; Pierleoni, Andrea; Bernsel, Andreas; Tress, Michael L; Bork, Peer; von Heijne, Gunnar; Valencia, Alfonso; Ouzounis, Christos A; Casadio, Rita; Brunak, Søren

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome. PMID:19226438

  14. Exploring Function Prediction in Protein Interaction Networks via Clustering Methods

    PubMed Central

    Trivodaliev, Kire; Bogojeska, Aleksandra; Kocarev, Ljupco

    2014-01-01

    Complex networks have recently become the focus of research in many fields. Their structure reveals crucial information for the nodes, how they connect and share information. In our work we analyze protein interaction networks as complex networks for their functional modular structure and later use that information in the functional annotation of proteins within the network. We propose several graph representations for the protein interaction network, each having different level of complexity and inclusion of the annotation information within the graph. We aim to explore what the benefits and the drawbacks of these proposed graphs are, when they are used in the function prediction process via clustering methods. For making this cluster based prediction, we adopt well established approaches for cluster detection in complex networks using most recent representative algorithms that have been proven as efficient in the task at hand. The experiments are performed using a purified and reliable Saccharomyces cerevisiae protein interaction network, which is then used to generate the different graph representations. Each of the graph representations is later analysed in combination with each of the clustering algorithms, which have been possibly modified and implemented to fit the specific graph. We evaluate results in regards of biological validity and function prediction performance. Our results indicate that the novel ways of presenting the complex graph improve the prediction process, although the computational complexity should be taken into account when deciding on a particular approach. PMID:24972109

  15. Protein structure prediction enhanced with evolutionary diversity : SPEED.

    SciTech Connect

    DeBartolo, J.; Hocky, G.; Wilde, M.; Xu, J.; Freed, K. F.; Sosnick, T. R.; Univ. of Chicago; Toyota Technological Inst. at Chicago

    2010-03-01

    For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of {phi},{psi} backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.

  16. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  17. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins. PMID:25069136

  18. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    PubMed Central

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  19. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  20. A computational method to predict carbonylation sites in yeast proteins.

    PubMed

    Lv, H Q; Liu, J; Han, J Q; Zheng, J G; Liu, R L

    2016-01-01

    Several post-translational modifications (PTM) have been discussed in literature. Among a variety of oxidative stress-induced PTM, protein carbonylation is considered a biomarker of oxidative stress. Only certain proteins can be carbonylated because only four amino acid residues, namely lysine (K), arginine (R), threonine (T) and proline (P), are susceptible to carbonylation. The yeast proteome is an excellent model to explore oxidative stress, especially protein carbonylation. Current experimental approaches in identifying carbonylation sites are expensive, time-consuming and limited in their abilities to process proteins. Furthermore, there is no bioinformational method to predict carbonylation sites in yeast proteins. Therefore, we propose a computational method to predict yeast carbonylation sites. This method has total accuracies of 86.32, 85.89, 84.80, and 86.80% in predicting the carbonylation sites of K, R, T, and P, respectively. These results were confirmed by 10-fold cross-validation. The ability to identify carbonylation sites in different kinds of features was analyzed and the position-specific composition of the modification site-flanking residues was discussed. Additionally, a software tool has been developed to help with the calculations in this method. Datasets and the software are available at https://sourceforge.net/projects/hqlstudio/ files/CarSpred.Y/. PMID:27420944

  1. From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions

    PubMed Central

    Gao, Mu; Skolnick, Jeffrey

    2009-01-01

    DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Cα deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein. PMID:19343221

  2. Choosing negative examples for the prediction of protein-protein interactions

    PubMed Central

    Ben-Hur, Asa; Noble, William Stafford

    2006-01-01

    The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions. PMID:16723005

  3. Structure based function prediction of proteins using fragment library frequency vectors

    PubMed Central

    Yadav, Akshay; Jayaraman, Valadi Krishnamoorthy

    2012-01-01

    The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10. PMID:23144557

  4. Pattern recognition methods for protein functional site prediction.

    PubMed

    Yang, Zheng Rong; Wang, Lipo; Young, Natasha; Trudgian, Dave; Chou, Kuo-Chen

    2005-10-01

    Protein functional site prediction is closely related to drug design, hence to public health. In order to save the cost and the time spent on identifying the functional sites in sequenced proteins in biology laboratory, computer programs have been widely used for decades. Many of them are implemented using the state-of-the-art pattern recognition algorithms, including decision trees, neural networks and support vector machines. Although the success of this effort has been obvious, advanced and new algorithms are still under development for addressing some difficult issues. This review will go through the major stages in developing pattern recognition algorithms for protein functional site prediction and outline the future research directions in this important area. PMID:16248799

  5. Using support vector machine for improving protein-protein interaction prediction utilizing domain interactions

    SciTech Connect

    Singhal, Mudita; Shah, Anuj R.; Brown, Roslyn N.; Adkins, Joshua N.

    2010-10-02

    Understanding protein interactions is essential to gain insights into the biological processes at the whole cell level. The high-throughput experimental techniques for determining protein-protein interactions (PPI) are error prone and expensive with low overlap amongst them. Although several computational methods have been proposed for predicting protein interactions there is definite room for improvement. Here we present DomainSVM, a predictive method for PPI that uses computationally inferred domain-domain interaction values in a Support Vector Machine framework to predict protein interactions. DomainSVM method utilizes evidence of multiple interacting domains to predict a protein interaction. It outperforms existing methods of PPI prediction by achieving very high explanation ratios, precision, specificity, sensitivity and F-measure values in a 10 fold cross-validation study conducted on the positive and negative PPIs in yeast. A Functional comparison study using GO annotations on the positive and the negative test sets is presented in addition to discussing novel PPI predictions in Salmonella Typhimurium.

  6. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  7. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  8. Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

    PubMed

    Nourani, Esmaeil; Khunjush, Farshad; Durmuş, Saliha

    2016-05-24

    Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods. PMID:27072625

  9. Nanoparticles-cell association predicted by protein corona fingerprints

    NASA Astrophysics Data System (ADS)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  10. A new mixed-mode model for interpreting and predicting protein elution during isoelectric chromatofocusing.

    PubMed

    Choy, Derek Y C; Creagh, A Louise; von Lieres, Eric; Haynes, Charles

    2014-05-01

    Experimental data are combined with classic theories describing electrolytes in solution and at surfaces to define the primary mechanisms influencing protein retention and elution during isoelectric chromatofocusing (ICF) of proteins and protein mixtures. Those fundamental findings are used to derive a new model to understand and predict elution times of proteins during ICF. The model uses a modified form of the steric mass action (SMA) isotherm to account for both ion exchange and isoelectric focusing contributions to protein partitioning. The dependence of partitioning on pH is accounted for through the characteristic charge parameter m of the SMA isotherm and the application of Gouy-Chapman theory to define the dependence of the equilibrium binding constant Kbi on both m and ionic strength. Finally, the effects of changes in matrix surface pH on protein retention are quantified through a Donnan equilibrium type model. By accounting for isoelectric focusing, ion binding and exchange, and surface pH contributions to protein retention and elution, the model is shown to accurately capture the dependence of protein elution times on column operating conditions. PMID:24293057

  11. Nanoparticles-cell association predicted by protein corona fingerprints.

    PubMed

    Palchetti, S; Digiacomo, L; Pozzi, D; Peruzzi, G; Micarelli, E; Mahmoudi, M; Caracciolo, G

    2016-07-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a "protein corona" layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø≈ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells. PMID:27279572

  12. Predicting the disruption by of a protein-ligand interaction

    PubMed Central

    Pible, Olivier; Vidaud, Claude; Plantevin, Sophie; Pellequer, Jean-Luc; Quéméneur, Eric

    2010-01-01

    The uranyl cation () can be suspected to interfere with the binding of essential metal cations to proteins, underlying some mechanisms of toxicity. A dedicated computational screen was used to identify binding sites within a set of nonredundant protein structures. The list of potential targets was compared to data from a small molecules interaction database to pinpoint specific examples where should be able to bind in the vicinity of an essential cation, and would be likely to affect the function of the corresponding protein. The C-reactive protein appeared as an interesting hit since its structure involves critical calcium ions in the binding of phosphorylcholine. Biochemical experiments confirmed the predicted binding site for and it was demonstrated by surface plasmon resonance assays that binding to CRP prevents the calcium-mediated binding of phosphorylcholine. Strikingly, the apparent affinity of for native CRP was almost 100-fold higher than that of Ca2+. This result exemplifies in the case of CRP the capability of our computational tool to predict effective binding sites for in proteins and is a first evidence of calcium substitution by the uranyl cation in a native protein. PMID:20842713

  13. How to measure and predict the molar absorption coefficient of a protein.

    PubMed Central

    Pace, C. N.; Vajdos, F.; Fee, L.; Grimsley, G.; Gray, T.

    1995-01-01

    The molar absorption coefficient, epsilon, of a protein is usually based on concentrations measured by dry weight, nitrogen, or amino acid analysis. The studies reported here suggest that the Edelhoch method is the best method for measuring epsilon for a protein. (This method is described by Gill and von Hippel [1989, Anal Biochem 182:319-326] and is based on data from Edelhoch [1967, Biochemistry 6:1948-1954]). The absorbance of a protein at 280 nm depends on the content of Trp, Tyr, and cystine (disulfide bonds). The average epsilon values for these chromophores in a sample of 18 well-characterized proteins have been estimated, and the epsilon values in water, propanol, 6 M guanidine hydrochloride (GdnHCl), and 8 M urea have been measured. For Trp, the average epsilon values for the proteins are less than the epsilon values measured in any of the solvents. For Tyr, the average epsilon values for the proteins are intermediate between those measured in 6 M GdnHCl and those measured in propanol. Based on a sample of 116 measured epsilon values for 80 proteins, the epsilon at 280 nm of a folded protein in water, epsilon (280), can best be predicted with this equation: epsilon (280) (M-1 cm-1) = (#Trp)(5,500) + (#Tyr)(1,490) + (#cystine)(125) These epsilon (280) values are quite reliable for proteins containing Trp residues, and less reliable for proteins that do not. However, the Edelhoch method is convenient and accurate, and the best approach is to measure rather than predict epsilon. PMID:8563639

  14. Toward Relatively General and Accurate Quantum Chemical Predictions of Solid-State 17O NMR Chemical Shifts in Various Biologically Relevant Oxygen-containing Compounds

    PubMed Central

    Rorick, Amber; Michael, Matthew A.; Yang, Liu; Zhang, Yong

    2015-01-01

    Oxygen is an important element in most biologically significant molecules and experimental solid-state 17O NMR studies have provided numerous useful structural probes to study these systems. However, computational predictions of solid-state 17O NMR chemical shift tensor properties are still challenging in many cases and in particular each of the prior computational work is basically limited to one type of oxygen-containing systems. This work provides the first systematic study of the effects of geometry refinement, method and basis sets for metal and non-metal elements in both geometry optimization and NMR property calculations of some biologically relevant oxygen-containing compounds with a good variety of XO bonding groups, X= H, C, N, P, and metal. The experimental range studied is of 1455 ppm, a major part of the reported 17O NMR chemical shifts in organic and organometallic compounds. A number of computational factors towards relatively general and accurate predictions of 17O NMR chemical shifts were studied to provide helpful and detailed suggestions for future work. For the studied various kinds of oxygen-containing compounds, the best computational approach results in a theory-versus-experiment correlation coefficient R2 of 0.9880 and mean absolute deviation of 13 ppm (1.9% of the experimental range) for isotropic NMR shifts and R2 of 0.9926 for all shift tensor properties. These results shall facilitate future computational studies of 17O NMR chemical shifts in many biologically relevant systems, and the high accuracy may also help refinement and determination of active-site structures of some oxygen-containing substrate bound proteins. PMID:26274812

  15. Predicting and improving the protein sequence alignment quality by support vector regression

    PubMed Central

    Lee, Minho; Jeong, Chan-seok; Kim, Dongsup

    2007-01-01

    Background For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. Results In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. Conclusion The present work demonstrates that the alignment quality can be