Science.gov

Sample records for accurately predict protein

  1. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    PubMed Central

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  2. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics

    PubMed Central

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  3. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  4. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research.

  5. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  6. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGES

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  7. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. PMID:25979264

  8. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    SciTech Connect

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  9. An Accurate Method for Prediction of Protein-Ligand Binding Site on Protein Surface Using SVM and Statistical Depth Function

    PubMed Central

    Wang, Kui; Gao, Jianzhao; Shen, Shiyi; Tuszynski, Jack A.; Ruan, Jishou

    2013-01-01

    Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we introduce statistical depth function to define negative samples and propose an SVM-based method which integrates sequence and structural information to predict binding site. The results show that the present method performs better than the existent ones. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively; on the independent test set, the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively. PMID:24195070

  10. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

    PubMed Central

    EL-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein

  11. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein

  12. Accurate and Rigorous Prediction of the Changes in Protein Free Energies in a Large-Scale Mutation Scan.

    PubMed

    Gapsys, Vytautas; Michielssens, Servaas; Seeliger, Daniel; de Groot, Bert L

    2016-06-20

    The prediction of mutation-induced free-energy changes in protein thermostability or protein-protein binding is of particular interest in the fields of protein design, biotechnology, and bioengineering. Herein, we achieve remarkable accuracy in a scan of 762 mutations estimating changes in protein thermostability based on the first principles of statistical mechanics. The remaining error in the free-energy estimates appears to be due to three sources in approximately equal parts, namely sampling, force-field inaccuracies, and experimental uncertainty. We propose a consensus force-field approach, which, together with an increased sampling time, leads to a free-energy prediction accuracy that matches those reached in experiments. This versatile approach enables accurate free-energy estimates for diverse proteins, including the prediction of changes in the melting temperature of the membrane protein neurotensin receptor 1. PMID:27122231

  13. Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction.

    PubMed

    Braun, Tatjana; Koehler Leman, Julia; Lange, Oliver F

    2015-12-01

    Recent work has shown that the accuracy of ab initio structure prediction can be significantly improved by integrating evolutionary information in form of intra-protein residue-residue contacts. Following this seminal result, much effort is put into the improvement of contact predictions. However, there is also a substantial need to develop structure prediction protocols tailored to the type of restraints gained by contact predictions. Here, we present a structure prediction protocol that combines evolutionary information with the resolution-adapted structural recombination approach of Rosetta, called RASREC. Compared to the classic Rosetta ab initio protocol, RASREC achieves improved sampling, better convergence and higher robustness against incorrect distance restraints, making it the ideal sampling strategy for the stated problem. To demonstrate the accuracy of our protocol, we tested the approach on a diverse set of 28 globular proteins. Our method is able to converge for 26 out of the 28 targets and improves the average TM-score of the entire benchmark set from 0.55 to 0.72 when compared to the top ranked models obtained by the EVFold web server using identical contact predictions. Using a smaller benchmark, we furthermore show that the prediction accuracy of our method is only slightly reduced when the contact prediction accuracy is comparatively low. This observation is of special interest for protein sequences that only have a limited number of homologs.

  14. Towards Accurate Residue-Residue Hydrophobic Contact Prediction for Alpha Helical Proteins Via Integer Linear Optimization

    PubMed Central

    Rajgaria, R.; McAllister, S. R.; Floudas, C. A.

    2008-01-01

    A new optimization-based method is presented to predict the hydrophobic residue contacts in α-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent α-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 Å and 14.67 Å respectively. PMID:18767158

  15. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    NASA Astrophysics Data System (ADS)

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-02-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process.

  16. Accurate prediction of interfacial residues in two-domain proteins using evolutionary information: implications for three-dimensional modeling.

    PubMed

    Bhaskara, Ramachandra M; Padhi, Amrita; Srinivasan, Narayanaswamy

    2014-07-01

    With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naïve Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (∼85%) and specific (∼95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions.

  17. HAAD: A quick algorithm for accurate prediction of hydrogen atoms in protein structures.

    PubMed

    Li, Yunqi; Roy, Ambrish; Zhang, Yang

    2009-08-20

    Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD.

  18. CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz A

    2012-01-01

    Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies. PMID:21919861

  19. CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz A

    2012-01-01

    Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies.

  20. Four-protein signature accurately predicts lymph node metastasis and survival in oral squamous cell carcinoma.

    PubMed

    Zanaruddin, Sharifah Nurain Syed; Saleh, Amyza; Yang, Yi-Hsin; Hamid, Sharifah; Mustafa, Wan Mahadzir Wan; Khairul Bariah, A A N; Zain, Rosnah Binti; Lau, Shin Hin; Cheong, Sok Ching

    2013-03-01

    The presence of lymph node (LN) metastasis significantly affects the survival of patients with oral squamous cell carcinoma (OSCC). Successful detection and removal of positive LNs are crucial in the treatment of this disease. Current evaluation methods still have their limitations in detecting the presence of tumor cells in the LNs, where up to a third of clinically diagnosed metastasis-negative (N0) patients actually have metastasis-positive LNs in the neck. We developed a molecular signature in the primary tumor that could predict LN metastasis in OSCC. A total of 211 cores from 55 individuals were included in the study. Eleven proteins were evaluated using immunohistochemical analysis in a tissue microarray. Of the 11 biomarkers evaluated using receiver operating curve analysis, epidermal growth factor receptor (EGFR), v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 (HER-2/neu), laminin, gamma 2 (LAMC2), and ras homolog family member C (RHOC) were found to be significantly associated with the presence of LN metastasis. Unsupervised hierarchical clustering-demonstrated expression patterns of these 4 proteins could be used to differentiate specimens that have positive LN metastasis from those that are negative for LN metastasis. Collectively, EGFR, HER-2/neu, LAMC2, and RHOC have a specificity of 87.5% and a sensitivity of 70%, with a prognostic accuracy of 83.4% for LN metastasis. We also demonstrated that the LN signature could independently predict disease-specific survival (P = .036). The 4-protein LN signature validated in an independent set of samples strongly suggests that it could reliably distinguish patients with LN metastasis from those who were metastasis-free and therefore could be a prognostic tool for the management of patients with OSCC.

  1. Four-protein signature accurately predicts lymph node metastasis and survival in oral squamous cell carcinoma.

    PubMed

    Zanaruddin, Sharifah Nurain Syed; Saleh, Amyza; Yang, Yi-Hsin; Hamid, Sharifah; Mustafa, Wan Mahadzir Wan; Khairul Bariah, A A N; Zain, Rosnah Binti; Lau, Shin Hin; Cheong, Sok Ching

    2013-03-01

    The presence of lymph node (LN) metastasis significantly affects the survival of patients with oral squamous cell carcinoma (OSCC). Successful detection and removal of positive LNs are crucial in the treatment of this disease. Current evaluation methods still have their limitations in detecting the presence of tumor cells in the LNs, where up to a third of clinically diagnosed metastasis-negative (N0) patients actually have metastasis-positive LNs in the neck. We developed a molecular signature in the primary tumor that could predict LN metastasis in OSCC. A total of 211 cores from 55 individuals were included in the study. Eleven proteins were evaluated using immunohistochemical analysis in a tissue microarray. Of the 11 biomarkers evaluated using receiver operating curve analysis, epidermal growth factor receptor (EGFR), v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 (HER-2/neu), laminin, gamma 2 (LAMC2), and ras homolog family member C (RHOC) were found to be significantly associated with the presence of LN metastasis. Unsupervised hierarchical clustering-demonstrated expression patterns of these 4 proteins could be used to differentiate specimens that have positive LN metastasis from those that are negative for LN metastasis. Collectively, EGFR, HER-2/neu, LAMC2, and RHOC have a specificity of 87.5% and a sensitivity of 70%, with a prognostic accuracy of 83.4% for LN metastasis. We also demonstrated that the LN signature could independently predict disease-specific survival (P = .036). The 4-protein LN signature validated in an independent set of samples strongly suggests that it could reliably distinguish patients with LN metastasis from those who were metastasis-free and therefore could be a prognostic tool for the management of patients with OSCC. PMID:23026198

  2. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

    PubMed Central

    Victora, Andrea; Möller, Heiko M.; Exner, Thomas E.

    2014-01-01

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  3. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    PubMed Central

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-01-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally—a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592

  4. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding.

    PubMed

    Nissley, Daniel A; Sharma, Ajeet K; Ahmed, Nabeel; Friedrich, Ulrike A; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P

    2016-01-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592

  5. Protein corona composition does not accurately predict hematocompatibility of colloidal gold nanoparticles.

    PubMed

    Dobrovolskaia, Marina A; Neun, Barry W; Man, Sonny; Ye, Xiaoying; Hansen, Matthew; Patri, Anil K; Crist, Rachael M; McNeil, Scott E

    2014-10-01

    Proteins bound to nanoparticle surfaces are known to affect particle clearance by influencing immune cell uptake and distribution to the organs of the mononuclear phagocytic system. The composition of the protein corona has been described for several types of nanomaterials, but the role of the corona in nanoparticle biocompatibility is not well established. In this study we investigate the role of nanoparticle surface properties (PEGylation) and incubation times on the protein coronas of colloidal gold nanoparticles. While neither incubation time nor PEG molecular weight affected the specific proteins in the protein corona, the total amount of protein binding was governed by the molecular weight of PEG coating. Furthermore, the composition of the protein corona did not correlate with nanoparticle hematocompatibility. Specialized hematological tests should be used to deduce nanoparticle hematotoxicity. From the clinical editor: It is overall unclear how the protein corona associated with colloidal gold nanoparticles may influence hematotoxicity. This study warns that PEGylation itself may be insufficient, because composition of the protein corona does not directly correlate with nanoparticle hematocompatibility. The authors suggest that specialized hematological tests must be used to deduce nanoparticle hematotoxicity.

  6. Accurate, conformation-dependent predictions of solvent effects on protein ionization constants

    PubMed Central

    Barth, P.; Alber, T.; Harbury, P. B.

    2007-01-01

    Predicting how aqueous solvent modulates the conformational transitions and influences the pKa values that regulate the biological functions of biomolecules remains an unsolved challenge. To address this problem, we developed FDPB_MF, a rotamer repacking method that exhaustively samples side chain conformational space and rigorously calculates multibody protein–solvent interactions. FDPB_MF predicts the effects on pKa values of various solvent exposures, large ionic strength variations, strong energetic couplings, structural reorganizations and sequence mutations. The method achieves high accuracy, with root mean square deviations within 0.3 pH unit of the experimental values measured for turkey ovomucoid third domain, hen lysozyme, Bacillus circulans xylanase, and human and Escherichia coli thioredoxins. FDPB_MF provides a faithful, quantitative assessment of electrostatic interactions in biological macromolecules. PMID:17360348

  7. Mathematical model accurately predicts protein release from an affinity-based delivery system.

    PubMed

    Vulic, Katarina; Pakulska, Malgosia M; Sonthalia, Rohit; Ramachandran, Arun; Shoichet, Molly S

    2015-01-10

    Affinity-based controlled release modulates the delivery of protein or small molecule therapeutics through transient dissociation/association. To understand which parameters can be used to tune release, we used a mathematical model based on simple binding kinetics. A comprehensive asymptotic analysis revealed three characteristic regimes for therapeutic release from affinity-based systems. These regimes can be controlled by diffusion or unbinding kinetics, and can exhibit release over either a single stage or two stages. This analysis fundamentally changes the way we think of controlling release from affinity-based systems and thereby explains some of the discrepancies in the literature on which parameters influence affinity-based release. The rate of protein release from affinity-based systems is determined by the balance of diffusion of the therapeutic agent through the hydrogel and the dissociation kinetics of the affinity pair. Equations for tuning protein release rate by altering the strength (KD) of the affinity interaction, the concentration of binding ligand in the system, the rate of dissociation (koff) of the complex, and the hydrogel size and geometry, are provided. We validated our model by collapsing the model simulations and the experimental data from a recently described affinity release system, to a single master curve. Importantly, this mathematical analysis can be applied to any single species affinity-based system to determine the parameters required for a desired release profile. PMID:25449806

  8. PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations

    PubMed Central

    Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

  9. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events.

    PubMed

    Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B; Ben-Hur, Asa; Boucher, Christina

    2016-01-01

    Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The [Formula: see text] class contains tandem [Formula: see text]-type motif sequences, and the [Formula: see text] class contains alternating [Formula: see text], [Formula: see text] and [Formula: see text] type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a [Formula: see text]-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the [Formula: see text] class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for [Formula: see text]-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805

  10. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events

    PubMed Central

    Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina

    2016-01-01

    Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805

  11. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

    PubMed

    Li, Xiaowei; Liu, Taigang; Tao, Peiying; Wang, Chunhua; Chen, Lanming

    2015-12-01

    Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.

  12. Microdosing of a Carbon-14 Labeled Protein in Healthy Volunteers Accurately Predicts Its Pharmacokinetics at Therapeutic Dosages.

    PubMed

    Vlaming, M L H; van Duijn, E; Dillingh, M R; Brands, R; Windhorst, A D; Hendrikse, N H; Bosgra, S; Burggraaf, J; de Koning, M C; Fidder, A; Mocking, J A J; Sandman, H; de Ligt, R A F; Fabriek, B O; Pasman, W J; Seinen, W; Alves, T; Carrondo, M; Peixoto, C; Peeters, P A M; Vaes, W H J

    2015-08-01

    Preclinical development of new biological entities (NBEs), such as human protein therapeutics, requires considerable expenditure of time and costs. Poor prediction of pharmacokinetics in humans further reduces net efficiency. In this study, we show for the first time that pharmacokinetic data of NBEs in humans can be successfully obtained early in the drug development process by the use of microdosing in a small group of healthy subjects combined with ultrasensitive accelerator mass spectrometry (AMS). After only minimal preclinical testing, we performed a first-in-human phase 0/phase 1 trial with a human recombinant therapeutic protein (RESCuing Alkaline Phosphatase, human recombinant placental alkaline phosphatase [hRESCAP]) to assess its safety and kinetics. Pharmacokinetic analysis showed dose linearity from microdose (53 μg) [(14) C]-hRESCAP to therapeutic doses (up to 5.3 mg) of the protein in healthy volunteers. This study demonstrates the value of a microdosing approach in a very small cohort for accelerating the clinical development of NBEs. PMID:25869840

  13. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of

  14. Hounsfield unit density accurately predicts ESWL success.

    PubMed

    Magnuson, William J; Tomera, Kevin M; Lance, Raymond S

    2005-01-01

    Extracorporeal shockwave lithotripsy (ESWL) is a commonly used non-invasive treatment for urolithiasis. Helical CT scans provide much better and detailed imaging of the patient with urolithiasis including the ability to measure density of urinary stones. In this study we tested the hypothesis that density of urinary calculi as measured by CT can predict successful ESWL treatment. 198 patients were treated at Alaska Urological Associates with ESWL between January 2002 and April 2004. Of these 101 met study inclusion with accessible CT scans and stones ranging from 5-15 mm. Follow-up imaging demonstrated stone freedom in 74.2%. The overall mean Houndsfield density value for stone-free compared to residual stone groups were significantly different ( 93.61 vs 122.80 p < 0.0001). We determined by receiver operator curve (ROC) that HDV of 93 or less carries a 90% or better chance of stone freedom following ESWL for upper tract calculi between 5-15mm.

  15. Mouse models of human AML accurately predict chemotherapy response

    PubMed Central

    Zuber, Johannes; Radtke, Ina; Pardee, Timothy S.; Zhao, Zhen; Rappaport, Amy R.; Luo, Weijun; McCurrach, Mila E.; Yang, Miao-Miao; Dolan, M. Eileen; Kogan, Scott C.; Downing, James R.; Lowe, Scott W.

    2009-01-01

    The genetic heterogeneity of cancer influences the trajectory of tumor progression and may underlie clinical variation in therapy response. To model such heterogeneity, we produced genetically and pathologically accurate mouse models of common forms of human acute myeloid leukemia (AML) and developed methods to mimic standard induction chemotherapy and efficiently monitor therapy response. We see that murine AMLs harboring two common human AML genotypes show remarkably diverse responses to conventional therapy that mirror clinical experience. Specifically, murine leukemias expressing the AML1/ETO fusion oncoprotein, associated with a favorable prognosis in patients, show a dramatic response to induction chemotherapy owing to robust activation of the p53 tumor suppressor network. Conversely, murine leukemias expressing MLL fusion proteins, associated with a dismal prognosis in patients, are drug-resistant due to an attenuated p53 response. Our studies highlight the importance of genetic information in guiding the treatment of human AML, functionally establish the p53 network as a central determinant of chemotherapy response in AML, and demonstrate that genetically engineered mouse models of human cancer can accurately predict therapy response in patients. PMID:19339691

  16. Accurate theoretical prediction of vibrational frequencies in an inhomogeneous dynamic environment: A case study of a glutamate molecule in water solution and in a protein-bound form

    NASA Astrophysics Data System (ADS)

    Speranskiy, Kirill; Kurnikova, Maria

    2004-07-01

    We propose a hierarchical approach to model vibrational frequencies of a ligand in a strongly fluctuating inhomogeneous environment such as a liquid solution or when bound to a macromolecule, e.g., a protein. Vibrational frequencies typically measured experimentally are ensemble averaged quantities which result (in part) from the influence of the strongly fluctuating solvent. Solvent fluctuations can be sampled effectively by a classical molecular simulation, which in our model serves as the first, low level of the hierarchy. At the second high level of the hierarchy a small subset of system coordinates is used to construct a patch of the potential surface (ab initio) relevant to the vibration in question. This subset of coordinates is under the influence of an instantaneous external force exerted by the environment. The force is calculated at the lower level of the hierarchy. The proposed methodology is applied to model vibrational frequencies of a glutamate in water and when bound to the Glutamate receptor protein and its mutant. Our results are in close agreement with the experimental values and frequency shifts measured by the Jayaraman group by the Fourier transform infrared spectroscopy [Q. Cheng et al., Biochem. 41, 1602 (2002)]. Our methodology proved useful in successfully reproducing vibrational frequencies of a ligand in such a soft, flexible, and strongly inhomogeneous protein as the Glutamate receptor.

  17. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations.

    PubMed

    Bendl, Jaroslav; Stourac, Jan; Salanda, Ondrej; Pavelka, Antonin; Wieben, Eric D; Zendulka, Jaroslav; Brezovsky, Jan; Damborsky, Jiri

    2014-01-01

    Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

  18. On the Accurate Prediction of CME Arrival At the Earth

    NASA Astrophysics Data System (ADS)

    Zhang, Jie; Hess, Phillip

    2016-07-01

    We will discuss relevant issues regarding the accurate prediction of CME arrival at the Earth, from both observational and theoretical points of view. In particular, we clarify the importance of separating the study of CME ejecta from the ejecta-driven shock in interplanetary CMEs (ICMEs). For a number of CME-ICME events well observed by SOHO/LASCO, STEREO-A and STEREO-B, we carry out the 3-D measurements by superimposing geometries onto both the ejecta and sheath separately. These measurements are then used to constrain a Drag-Based Model, which is improved through a modification of including height dependence of the drag coefficient into the model. Combining all these factors allows us to create predictions for both fronts at 1 AU and compare with actual in-situ observations. We show an ability to predict the sheath arrival with an average error of under 4 hours, with an RMS error of about 1.5 hours. For the CME ejecta, the error is less than two hours with an RMS error within an hour. Through using the best observations of CMEs, we show the power of our method in accurately predicting CME arrival times. The limitation and implications of our accurate prediction method will be discussed.

  19. PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

    PubMed Central

    Bendl, Jaroslav; Stourac, Jan; Salanda, Ondrej; Pavelka, Antonin; Wieben, Eric D.; Zendulka, Jaroslav; Brezovsky, Jan; Damborsky, Jiri

    2014-01-01

    Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp. PMID:24453961

  20. Passive samplers accurately predict PAH levels in resident crayfish.

    PubMed

    Paulik, L Blair; Smith, Brian W; Bergmann, Alan J; Sower, Greg J; Forsberg, Norman D; Teeguarden, Justin G; Anderson, Kim A

    2016-02-15

    Contamination of resident aquatic organisms is a major concern for environmental risk assessors. However, collecting organisms to estimate risk is often prohibitively time and resource-intensive. Passive sampling accurately estimates resident organism contamination, and it saves time and resources. This study used low density polyethylene (LDPE) passive water samplers to predict polycyclic aromatic hydrocarbon (PAH) levels in signal crayfish, Pacifastacus leniusculus. Resident crayfish were collected at 5 sites within and outside of the Portland Harbor Superfund Megasite (PHSM) in the Willamette River in Portland, Oregon. LDPE deployment was spatially and temporally paired with crayfish collection. Crayfish visceral and tail tissue, as well as water-deployed LDPE, were extracted and analyzed for 62 PAHs using GC-MS/MS. Freely-dissolved concentrations (Cfree) of PAHs in water were calculated from concentrations in LDPE. Carcinogenic risks were estimated for all crayfish tissues, using benzo[a]pyrene equivalent concentrations (BaPeq). ∑PAH were 5-20 times higher in viscera than in tails, and ∑BaPeq were 6-70 times higher in viscera than in tails. Eating only tail tissue of crayfish would therefore significantly reduce carcinogenic risk compared to also eating viscera. Additionally, PAH levels in crayfish were compared to levels in crayfish collected 10 years earlier. PAH levels in crayfish were higher upriver of the PHSM and unchanged within the PHSM after the 10-year period. Finally, a linear regression model predicted levels of 34 PAHs in crayfish viscera with an associated R-squared value of 0.52 (and a correlation coefficient of 0.72), using only the Cfree PAHs in water. On average, the model predicted PAH concentrations in crayfish tissue within a factor of 2.4 ± 1.8 of measured concentrations. This affirms that passive water sampling accurately estimates PAH contamination in crayfish. Furthermore, the strong predictive ability of this simple model suggests

  1. Plant diversity accurately predicts insect diversity in two tropical landscapes.

    PubMed

    Zhang, Kai; Lin, Siliang; Ji, Yinqiu; Yang, Chenxue; Wang, Xiaoyang; Yang, Chunyan; Wang, Hesheng; Jiang, Haisheng; Harrison, Rhett D; Yu, Douglas W

    2016-09-01

    Plant diversity surely determines arthropod diversity, but only moderate correlations between arthropod and plant species richness had been observed until Basset et al. (Science, 338, 2012 and 1481) finally undertook an unprecedentedly comprehensive sampling of a tropical forest and demonstrated that plant species richness could indeed accurately predict arthropod species richness. We now require a high-throughput pipeline to operationalize this result so that we can (i) test competing explanations for tropical arthropod megadiversity, (ii) improve estimates of global eukaryotic species diversity, and (iii) use plant and arthropod communities as efficient proxies for each other, thus improving the efficiency of conservation planning and of detecting forest degradation and recovery. We therefore applied metabarcoding to Malaise-trap samples across two tropical landscapes in China. We demonstrate that plant species richness can accurately predict arthropod (mostly insect) species richness and that plant and insect community compositions are highly correlated, even in landscapes that are large, heterogeneous and anthropogenically modified. Finally, we review how metabarcoding makes feasible highly replicated tests of the major competing explanations for tropical megadiversity. PMID:27474399

  2. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    DOE PAGES

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; et al

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less

  3. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    SciTech Connect

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; Rose, Kristie L.; Tabb, David L.

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.

  4. Passive samplers accurately predict PAH levels in resident crayfish.

    PubMed

    Paulik, L Blair; Smith, Brian W; Bergmann, Alan J; Sower, Greg J; Forsberg, Norman D; Teeguarden, Justin G; Anderson, Kim A

    2016-02-15

    Contamination of resident aquatic organisms is a major concern for environmental risk assessors. However, collecting organisms to estimate risk is often prohibitively time and resource-intensive. Passive sampling accurately estimates resident organism contamination, and it saves time and resources. This study used low density polyethylene (LDPE) passive water samplers to predict polycyclic aromatic hydrocarbon (PAH) levels in signal crayfish, Pacifastacus leniusculus. Resident crayfish were collected at 5 sites within and outside of the Portland Harbor Superfund Megasite (PHSM) in the Willamette River in Portland, Oregon. LDPE deployment was spatially and temporally paired with crayfish collection. Crayfish visceral and tail tissue, as well as water-deployed LDPE, were extracted and analyzed for 62 PAHs using GC-MS/MS. Freely-dissolved concentrations (Cfree) of PAHs in water were calculated from concentrations in LDPE. Carcinogenic risks were estimated for all crayfish tissues, using benzo[a]pyrene equivalent concentrations (BaPeq). ∑PAH were 5-20 times higher in viscera than in tails, and ∑BaPeq were 6-70 times higher in viscera than in tails. Eating only tail tissue of crayfish would therefore significantly reduce carcinogenic risk compared to also eating viscera. Additionally, PAH levels in crayfish were compared to levels in crayfish collected 10 years earlier. PAH levels in crayfish were higher upriver of the PHSM and unchanged within the PHSM after the 10-year period. Finally, a linear regression model predicted levels of 34 PAHs in crayfish viscera with an associated R-squared value of 0.52 (and a correlation coefficient of 0.72), using only the Cfree PAHs in water. On average, the model predicted PAH concentrations in crayfish tissue within a factor of 2.4 ± 1.8 of measured concentrations. This affirms that passive water sampling accurately estimates PAH contamination in crayfish. Furthermore, the strong predictive ability of this simple model suggests

  5. Accurate contact predictions using covariation techniques and machine learning

    PubMed Central

    Kosciolek, Tomasz

    2015-01-01

    ABSTRACT Here we present the results of residue–residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top‐L/5 long‐range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two‐stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple‐sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2016; 84(Suppl 1):145–151. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26205532

  6. PROMALS web server for accurate multiple protein sequence alignments.

    PubMed

    Pei, Jimin; Kim, Bong-Hyun; Tang, Ming; Grishin, Nick V

    2007-07-01

    Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/ PMID:17452345

  7. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.

  8. Predicting protein dynamics from structural ensembles

    NASA Astrophysics Data System (ADS)

    Copperman, J.; Guenza, M. G.

    2015-12-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

  9. Fast and accurate predictions of covalent bonds in chemical space.

    PubMed

    Chang, K Y Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (∼1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H2 (+). Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  10. Fast and accurate predictions of covalent bonds in chemical space

    NASA Astrophysics Data System (ADS)

    Chang, K. Y. Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O. Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (˜1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H 2+ . Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  11. Fast and accurate predictions of covalent bonds in chemical space.

    PubMed

    Chang, K Y Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (∼1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H2 (+). Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  12. Accurate Determination of Conformational Transitions in Oligomeric Membrane Proteins

    PubMed Central

    Sanz-Hernández, Máximo; Vostrikov, Vitaly V.; Veglia, Gianluigi; De Simone, Alfonso

    2016-01-01

    The structural dynamics governing collective motions in oligomeric membrane proteins play key roles in vital biomolecular processes at cellular membranes. In this study, we present a structural refinement approach that combines solid-state NMR experiments and molecular simulations to accurately describe concerted conformational transitions identifying the overall structural, dynamical, and topological states of oligomeric membrane proteins. The accuracy of the structural ensembles generated with this method is shown to reach the statistical error limit, and is further demonstrated by correctly reproducing orthogonal NMR data. We demonstrate the accuracy of this approach by characterising the pentameric state of phospholamban, a key player in the regulation of calcium uptake in the sarcoplasmic reticulum, and by probing its dynamical activation upon phosphorylation. Our results underline the importance of using an ensemble approach to characterise the conformational transitions that are often responsible for the biological function of oligomeric membrane protein states. PMID:26975211

  13. The SILAC Fly Allows for Accurate Protein Quantification in Vivo*

    PubMed Central

    Sury, Matthias D.; Chen, Jia-Xuan; Selbach, Matthias

    2010-01-01

    Stable isotope labeling by amino acids in cell culture (SILAC) is widely used to quantify protein abundance in tissue culture cells. Until now, the only multicellular organism completely labeled at the amino acid level was the laboratory mouse. The fruit fly Drosophila melanogaster is one of the most widely used small animal models in biology. Here, we show that feeding flies with SILAC-labeled yeast leads to almost complete labeling in the first filial generation. We used these “SILAC flies” to investigate sexual dimorphism of protein abundance in D. melanogaster. Quantitative proteome comparison of adult male and female flies revealed distinct biological processes specific for each sex. Using a tudor mutant that is defective for germ cell generation allowed us to differentiate between sex-specific protein expression in the germ line and somatic tissue. We identified many proteins with known sex-specific expression bias. In addition, several new proteins with a potential role in sexual dimorphism were identified. Collectively, our data show that the SILAC fly can be used to accurately quantify protein abundance in vivo. The approach is simple, fast, and cost-effective, making SILAC flies an attractive model system for the emerging field of in vivo quantitative proteomics. PMID:20525996

  14. Transmembrane beta-barrel protein structure prediction

    NASA Astrophysics Data System (ADS)

    Randall, Arlo; Baldi, Pierre

    Transmembrane β-barrel (TMB) proteins are embedded in the outer membranes of mitochondria, Gram-negative bacteria, and chloroplasts. These proteins perform critical functions, including active ion-transport and passive nutrient intake. Therefore, there is a need for accurate prediction of secondary and tertiary structures of TMB proteins. A variety of methods have been developed for predicting the secondary structure and these predictions are very useful for constructing a coarse topology of TMB structure; however, they do not provide enough information to construct a low-resolution tertiary structure for a TMB protein. In addition, while the overall structural architecture is well conserved among TMB proteins, the amino acid sequences are highly divergent. Thus, traditional homology modeling methods cannot be applied to many putative TMB proteins. Here, we describe the TMBpro: a pipeline of methods for predicting TMB secondary structure, β-residue contacts, and finally tertiary structure. The tertiary prediction method relies on the specific construction rules that TMB proteins adhere to and on the predicted β-residue contacts to dramatically reduce the search space for the model building procedure.

  15. Protein Function Prediction: Problems and Pitfalls.

    PubMed

    Pearson, William R

    2015-01-01

    The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood. PMID:26334923

  16. Protein Function Prediction: Problems and Pitfalls.

    PubMed

    Pearson, William R

    2015-01-01

    The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood.

  17. BPROMPT: A consensus server for membrane protein prediction.

    PubMed

    Taylor, Paul D; Attwood, Teresa K; Flower, Darren R

    2003-07-01

    Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT. PMID:12824397

  18. Change in heat capacity accurately predicts vibrational coupling in enzyme catalyzed reactions.

    PubMed

    Arcus, Vickery L; Pudney, Christopher R

    2015-08-01

    The temperature dependence of kinetic isotope effects (KIEs) have been used to infer the vibrational coupling of the protein and or substrate to the reaction coordinate, particularly in enzyme-catalyzed hydrogen transfer reactions. We find that a new model for the temperature dependence of experimentally determined observed rate constants (macromolecular rate theory, MMRT) is able to accurately predict the occurrence of vibrational coupling, even where the temperature dependence of the KIE fails. This model, that incorporates the change in heat capacity for enzyme catalysis, demonstrates remarkable consistency with both experiment and theory and in many respects is more robust than models used at present.

  19. Accurately Predicting Complex Reaction Kinetics from First Principles

    NASA Astrophysics Data System (ADS)

    Green, William

    Many important systems contain a multitude of reactive chemical species, some of which react on a timescale faster than collisional thermalization, i.e. they never achieve a Boltzmann energy distribution. Usually it is impossible to fully elucidate the processes by experiments alone. Here we report recent progress toward predicting the time-evolving composition of these systems a priori: how unexpected reactions can be discovered on the computer, how reaction rates are computed from first principles, and how the many individual reactions are efficiently combined into a predictive simulation for the whole system. Some experimental tests of the a priori predictions are also presented.

  20. Does more accurate exposure prediction necessarily improve health effect estimates?

    PubMed

    Szpiro, Adam A; Paciorek, Christopher J; Sheppard, Lianne

    2011-09-01

    A unique challenge in air pollution cohort studies and similar applications in environmental epidemiology is that exposure is not measured directly at subjects' locations. Instead, pollution data from monitoring stations at some distance from the study subjects are used to predict exposures, and these predicted exposures are used to estimate the health effect parameter of interest. It is usually assumed that minimizing the error in predicting the true exposure will improve health effect estimation. We show in a simulation study that this is not always the case. We interpret our results in light of recently developed statistical theory for measurement error, and we discuss implications for the design and analysis of epidemiologic research.

  1. PREFACE: Protein protein interactions: principles and predictions

    NASA Astrophysics Data System (ADS)

    Nussinov, Ruth; Tsai, Chung-Jung

    2005-06-01

    Proteins are the `workhorses' of the cell. Their roles span functions as diverse as being molecular machines and signalling. They carry out catalytic reactions, transport, form viral capsids, traverse membranes and form regulated channels, transmit information from DNA to RNA, making possible the synthesis of new proteins, and they are responsible for the degradation of unnecessary proteins and nucleic acids. They are the vehicles of the immune response and are responsible for viral entry into the cell. Given their importance, considerable effort has been centered on the prediction of protein function. A prime way to do this is through identification of binding partners. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) in which it plays a role. This holds since the vast majority of their chores in the living cell involve protein-protein interactions. Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation. Their identification is at the heart of functional genomics; their prediction is crucial for drug discovery. Knowledge of the pathway, its topology, length, and dynamics may provide useful information for forecasting side effects. The goal of predicting protein-protein interactions is daunting. Some associations are obligatory, others are continuously forming and dissociating. In principle, from the physical standpoint, any two proteins can interact, but under what conditions and at which strength? The principles of protein-protein interactions are general: the non-covalent interactions of two proteins are largely the outcome of the hydrophobic effect, which drives the interactions. In addition, hydrogen bonds and electrostatic interactions play important roles. Thus, many of the interactions observed in vitro are the outcome of experimental overexpression. Protein disorder

  2. Is Three-Dimensional Soft Tissue Prediction by Software Accurate?

    PubMed

    Nam, Ki-Uk; Hong, Jongrak

    2015-11-01

    The authors assessed whether virtual surgery, performed with a soft tissue prediction program, could correctly simulate the actual surgical outcome, focusing on soft tissue movement. Preoperative and postoperative computed tomography (CT) data for 29 patients, who had undergone orthognathic surgery, were obtained and analyzed using the Simplant Pro software. The program made a predicted soft tissue image (A) based on presurgical CT data. After the operation, we obtained actual postoperative CT data and an actual soft tissue image (B) was generated. Finally, the 2 images (A and B) were superimposed and analyzed differences between the A and B. Results were grouped in 2 classes: absolute values and vector values. In the absolute values, the left mouth corner was the most significant error point (2.36 mm). The right mouth corner (2.28 mm), labrale inferius (2.08 mm), and the pogonion (2.03 mm) also had significant errors. In vector values, prediction of the right-left side had a left-sided tendency, the superior-inferior had a superior tendency, and the anterior-posterior showed an anterior tendency. As a result, with this program, the position of points tended to be located more left, anterior, and superior than the "real" situation. There is a need to improve the prediction accuracy for soft tissue images. Such software is particularly valuable in predicting craniofacial soft tissues landmarks, such as the pronasale. With this software, landmark positions were most inaccurate in terms of anterior-posterior predictions.

  3. Towards Accurate Ab Initio Predictions of the Spectrum of Methane

    NASA Technical Reports Server (NTRS)

    Schwenke, David W.; Kwak, Dochan (Technical Monitor)

    2001-01-01

    We have carried out extensive ab initio calculations of the electronic structure of methane, and these results are used to compute vibrational energy levels. We include basis set extrapolations, core-valence correlation, relativistic effects, and Born- Oppenheimer breakdown terms in our calculations. Our ab initio predictions of the lowest lying levels are superb.

  4. Standardized EEG interpretation accurately predicts prognosis after cardiac arrest

    PubMed Central

    Rossetti, Andrea O.; van Rootselaar, Anne-Fleur; Wesenberg Kjaer, Troels; Horn, Janneke; Ullén, Susann; Friberg, Hans; Nielsen, Niklas; Rosén, Ingmar; Åneman, Anders; Erlinge, David; Gasche, Yvan; Hassager, Christian; Hovdenes, Jan; Kjaergaard, Jesper; Kuiper, Michael; Pellis, Tommaso; Stammet, Pascal; Wanscher, Michael; Wetterslev, Jørn; Wise, Matt P.; Cronberg, Tobias

    2016-01-01

    Objective: To identify reliable predictors of outcome in comatose patients after cardiac arrest using a single routine EEG and standardized interpretation according to the terminology proposed by the American Clinical Neurophysiology Society. Methods: In this cohort study, 4 EEG specialists, blinded to outcome, evaluated prospectively recorded EEGs in the Target Temperature Management trial (TTM trial) that randomized patients to 33°C vs 36°C. Routine EEG was performed in patients still comatose after rewarming. EEGs were classified into highly malignant (suppression, suppression with periodic discharges, burst-suppression), malignant (periodic or rhythmic patterns, pathological or nonreactive background), and benign EEG (absence of malignant features). Poor outcome was defined as best Cerebral Performance Category score 3–5 until 180 days. Results: Eight TTM sites randomized 202 patients. EEGs were recorded in 103 patients at a median 77 hours after cardiac arrest; 37% had a highly malignant EEG and all had a poor outcome (specificity 100%, sensitivity 50%). Any malignant EEG feature had a low specificity to predict poor prognosis (48%) but if 2 malignant EEG features were present specificity increased to 96% (p < 0.001). Specificity and sensitivity were not significantly affected by targeted temperature or sedation. A benign EEG was found in 1% of the patients with a poor outcome. Conclusions: Highly malignant EEG after rewarming reliably predicted poor outcome in half of patients without false predictions. An isolated finding of a single malignant feature did not predict poor outcome whereas a benign EEG was highly predictive of a good outcome. PMID:26865516

  5. How Accurately Can We Predict Eclipses for Algol? (Poster abstract)

    NASA Astrophysics Data System (ADS)

    Turner, D.

    2016-06-01

    (Abstract only) beta Persei, or Algol, is a very well known eclipsing binary system consisting of a late B-type dwarf that is regularly eclipsed by a GK subgiant every 2.867 days. Eclipses, which last about 8 hours, are regular enough that predictions for times of minima are published in various places, Sky & Telescope magazine and The Observer's Handbook, for example. But eclipse minimum lasts for less than a half hour, whereas subtle mistakes in the current ephemeris for the star can result in predictions that are off by a few hours or more. The Algol system is fairly complex, with the Algol A and Algol B eclipsing system also orbited by Algol C with an orbital period of nearly 2 years. Added to that are complex long-term O-C variations with a periodicity of almost two centuries that, although suggested by Hoffmeister to be spurious, fit the type of light travel time variations expected for a fourth star also belonging to the system. The AB sub-system also undergoes mass transfer events that add complexities to its O-C behavior. Is it actually possible to predict precise times of eclipse minima for Algol months in advance given such complications, or is it better to encourage ongoing observations of the star so that O-C variations can be tracked in real time?

  6. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  7. Accurate predictions for the production of vaporized water

    SciTech Connect

    Morin, E.; Montel, F.

    1995-12-31

    The production of water vaporized in the gas phase is controlled by the local conditions around the wellbore. The pressure gradient applied to the formation creates a sharp increase of the molar water content in the hydrocarbon phase approaching the well; this leads to a drop in the pore water saturation around the wellbore. The extent of the dehydrated zone which is formed is the key controlling the bottom-hole content of vaporized water. The maximum water content in the hydrocarbon phase at a given pressure, temperature and salinity is corrected by capillarity or adsorption phenomena depending on the actual water saturation. Describing the mass transfer of the water between the hydrocarbon phases and the aqueous phase into the tubing gives a clear idea of vaporization effects on the formation of scales. Field example are presented for gas fields with temperatures ranging between 140{degrees}C and 180{degrees}C, where water vaporization effects are significant. Conditions for salt plugging in the tubing are predicted.

  8. Change in BMI accurately predicted by social exposure to acquaintances.

    PubMed

    Oloritun, Rahman O; Ouarda, Taha B M J; Moturu, Sai; Madan, Anmol; Pentland, Alex Sandy; Khayal, Inas

    2013-01-01

    Research has mostly focused on obesity and not on processes of BMI change more generally, although these may be key factors that lead to obesity. Studies have suggested that obesity is affected by social ties. However these studies used survey based data collection techniques that may be biased toward select only close friends and relatives. In this study, mobile phone sensing techniques were used to routinely capture social interaction data in an undergraduate dorm. By automating the capture of social interaction data, the limitations of self-reported social exposure data are avoided. This study attempts to understand and develop a model that best describes the change in BMI using social interaction data. We evaluated a cohort of 42 college students in a co-located university dorm, automatically captured via mobile phones and survey based health-related information. We determined the most predictive variables for change in BMI using the least absolute shrinkage and selection operator (LASSO) method. The selected variables, with gender, healthy diet category, and ability to manage stress, were used to build multiple linear regression models that estimate the effect of exposure and individual factors on change in BMI. We identified the best model using Akaike Information Criterion (AIC) and R(2). This study found a model that explains 68% (p<0.0001) of the variation in change in BMI. The model combined social interaction data, especially from acquaintances, and personal health-related information to explain change in BMI. This is the first study taking into account both interactions with different levels of social interaction and personal health-related information. Social interactions with acquaintances accounted for more than half the variation in change in BMI. This suggests the importance of not only individual health information but also the significance of social interactions with people we are exposed to, even people we may not consider as close friends.

  9. Accurate prediction of solvent accessibility using neural networks-based regression.

    PubMed

    Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2004-09-01

    Accurate prediction of relative solvent accessibilities (RSAs) of amino acid residues in proteins may be used to facilitate protein structure prediction and functional annotation. Toward that goal we developed a novel method for improved prediction of RSAs. Contrary to other machine learning-based methods from the literature, we do not impose a classification problem with arbitrary boundaries between the classes. Instead, we seek a continuous approximation of the real-value RSA using nonlinear regression, with several feed forward and recurrent neural networks, which are then combined into a consensus predictor. A set of 860 protein structures derived from the PFAM database was used for training, whereas validation of the results was carefully performed on several nonredundant control sets comprising a total of 603 structures derived from new Protein Data Bank structures and had no homology to proteins included in the training. Two classes of alternative predictors were developed for comparison with the regression-based approach: one based on the standard classification approach and the other based on a semicontinuous approximation with the so-called thermometer encoding. Furthermore, a weighted approximation, with errors being scaled by the observed levels of variability in RSA for equivalent residues in families of homologous structures, was applied in order to improve the results. The effects of including evolutionary profiles and the growth of sequence databases were assessed. In accord with the observed levels of variability in RSA for different ranges of RSA values, the regression accuracy is higher for buried than for exposed residues, with overall 15.3-15.8% mean absolute errors and correlation coefficients between the predicted and experimental values of 0.64-0.67 on different control sets. The new method outperforms classification-based algorithms when the real value predictions are projected onto two-class classification problems with several commonly

  10. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  11. Semiempirical prediction of protein folds

    NASA Astrophysics Data System (ADS)

    Fernández, Ariel; Colubri, Andrés; Appignanesi, Gustavo

    2001-08-01

    We introduce a semiempirical approach to predict ab initio expeditious pathways and native backbone geometries of proteins that fold under in vitro renaturation conditions. The algorithm is engineered to incorporate a discrete codification of local steric hindrances that constrain the movements of the peptide backbone throughout the folding process. Thus, the torsional state of the chain is assumed to be conditioned by the fact that hopping from one basin of attraction to another in the Ramachandran map (local potential energy surface) of each residue is energetically more costly than the search for a specific (Φ, Ψ) torsional state within a single basin. A combinatorial procedure is introduced to evaluate coarsely defined torsional states of the chain defined ``modulo basins'' and translate them into meaningful patterns of long range interactions. Thus, an algorithm for structure prediction is designed based on the fact that local contributions to the potential energy may be subsumed into time-evolving conformational constraints defining sets of restricted backbone geometries whereupon the patterns of nonbonded interactions are constructed. The predictive power of the algorithm is assessed by (a) computing ab initio folding pathways for mammalian ubiquitin that ultimately yield a stable structural pattern reproducing all of its native features, (b) determining the nucleating event that triggers the hydrophobic collapse of the chain, and (c) comparing coarse predictions of the stable folds of moderately large proteins (N~100) with structural information extracted from the protein data bank.

  12. Accurate refinement of docked protein complexes using evolutionary information and deep learning.

    PubMed

    Akbal-Delibas, Bahar; Farhoodi, Roshanak; Pomplun, Marc; Haspel, Nurit

    2016-06-01

    One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures. PMID:26846813

  13. Accurate refinement of docked protein complexes using evolutionary information and deep learning.

    PubMed

    Akbal-Delibas, Bahar; Farhoodi, Roshanak; Pomplun, Marc; Haspel, Nurit

    2016-06-01

    One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures.

  14. A multi-objective optimization approach accurately resolves protein domain architectures

    PubMed Central

    Bernardes, J.S.; Vieira, F.R.J.; Zaverucha, G.; Carbone, A.

    2016-01-01

    Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26458889

  15. Scoring docking conformations using predicted protein interfaces

    PubMed Central

    2014-01-01

    Background Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations. PMID:24906633

  16. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  17. ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

    SciTech Connect

    Visel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.

    2009-02-01

    A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.

  18. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be.

  19. MM-ISMSA: An Ultrafast and Accurate Scoring Function for Protein-Protein Docking.

    PubMed

    Klett, Javier; Núñez-Salgado, Alfonso; Dos Santos, Helena G; Cortés-Cabrera, Álvaro; Perona, Almudena; Gil-Redondo, Rubén; Abia, David; Gago, Federico; Morreale, Antonio

    2012-09-11

    An ultrafast and accurate scoring function for protein-protein docking is presented. It includes (1) a molecular mechanics (MM) part based on a 12-6 Lennard-Jones potential; (2) an electrostatic component based on an implicit solvent model (ISM) with individual desolvation penalties for each partner in the protein-protein complex plus a hydrogen bonding term; and (3) a surface area (SA) contribution to account for the loss of water contacts upon protein-protein complex formation. The accuracy and performance of the scoring function, termed MM-ISMSA, have been assessed by (1) comparing the total binding energies, the electrostatic term, and its components (charge-charge and individual desolvation energies), as well as the per residue contributions, to results obtained with well-established methods such as APBSA or MM-PB(GB)SA for a set of 1242 decoy protein-protein complexes and (2) testing its ability to recognize the docking solution closest to the experimental structure as that providing the most favorable total binding energy. For this purpose, a test set consisting of 15 protein-protein complexes with known 3D structure mixed with 10 decoys for each complex was used. The correlation between the values afforded by MM-ISMSA and those from the other methods is quite remarkable (r(2) ∼ 0.9), and only 0.2-5.0 s (depending on the number of residues) are spent on a single calculation including an all vs all pairwise energy decomposition. On the other hand, MM-ISMSA correctly identifies the best docking solution as that closest to the experimental structure in 80% of the cases. Finally, MM-ISMSA can process molecular dynamics trajectories and reports the results as averaged values with their standard deviations. MM-ISMSA has been implemented as a plugin to the widely used molecular graphics program PyMOL, although it can also be executed in command-line mode. MM-ISMSA is distributed free of charge to nonprofit organizations.

  20. Structure Prediction of Protein Complexes

    NASA Astrophysics Data System (ADS)

    Pierce, Brian; Weng, Zhiping

    Protein-protein interactions are critical for biological function. They directly and indirectly influence the biological systems of which they are a part. Antibodies bind with antigens to detect and stop viruses and other infectious agents. Cell signaling is performed in many cases through the interactions between proteins. Many diseases involve protein-protein interactions on some level, including cancer and prion diseases.

  1. Predicting the protein-protein interactions using primary structures with predicted protein surface

    PubMed Central

    2010-01-01

    Background Many biological functions involve various protein-protein interactions (PPIs). Elucidating such interactions is crucial for understanding general principles of cellular systems. Previous studies have shown the potential of predicting PPIs based on only sequence information. Compared to approaches that require other auxiliary information, these sequence-based approaches can be applied to a broader range of applications. Results This study presents a novel sequence-based method based on the assumption that protein-protein interactions are more related to amino acids at the surface than those at the core. The present method considers surface information and maintains the advantage of relying on only sequence data by including an accessible surface area (ASA) predictor recently proposed by the authors. This study also reports the experiments conducted to evaluate a) the performance of PPI prediction achieved by including the predicted surface and b) the quality of the predicted surface in comparison with the surface obtained from structures. The experimental results show that surface information helps to predict interacting protein pairs. Furthermore, the prediction performance achieved by using the surface estimated with the ASA predictor is close to that using the surface obtained from protein structures. Conclusion This work presents a sequence-based method that takes into account surface information for predicting PPIs. The proposed procedure of surface identification improves the prediction performance with an F-measure of 5.1%. The extracted surfaces are also valuable in other biomedical applications that require similar information. PMID:20122202

  2. RAP: Accurate and Fast Motif Finding Based on Protein-Binding Microarray Data

    PubMed Central

    Orenstein, Yaron; Mick, Eran

    2013-01-01

    Abstract The novel high-throughput technology of protein-binding microarrays (PBMs) measures binding intensity of a transcription factor to thousands of DNA probe sequences. Several algorithms have been developed to extract binding-site motifs from these data. Such motifs are commonly represented by positional weight matrices. Previous studies have shown that the motifs produced by these algorithms are either accurate in predicting in vitro binding or similar to previously published motifs, but not both. In this work, we present a new simple algorithm to infer binding-site motifs from PBM data. It outperforms prior art both in predicting in vitro binding and in producing motifs similar to literature motifs. Our results challenge previous claims that motifs with lower information content are better models for transcription-factor binding specificity. Moreover, we tested the effect of motif length and side positions flanking the “core” motif in the binding site. We show that side positions have a significant effect and should not be removed, as commonly done. A large drop in the results quality of all methods is observed between in vitro and in vivo binding prediction. The software is available on acgt.cs.tau.ac.il/rap. PMID:23464877

  3. DSP: a protein shape string and its profile prediction server.

    PubMed

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-07-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.

  4. Mind-set and close relationships: when bias leads to (In)accurate predictions.

    PubMed

    Gagné, F M; Lydon, J E

    2001-07-01

    The authors investigated whether mind-set influences the accuracy of relationship predictions. Because people are more biased in their information processing when thinking about implementing an important goal, relationship predictions made in an implemental mind-set were expected to be less accurate than those made in a more impartial deliberative mind-set. In Study 1, open-ended thoughts of students about to leave for university were coded for mind-set. In Study 2, mind-set about a major life goal was assessed using a self-report measure. In Study 3, mind-set was experimentally manipulated. Overall, mind-set interacted with forecasts to predict relationship survival. Forecasts were more accurate in a deliberative mind-set than in an implemental mind-set. This effect was more pronounced for long-term than for short-term relationship survival. Finally, deliberatives were not pessimistic; implementals were unduly optimistic.

  5. Protein Residue Contacts and Prediction Methods

    PubMed Central

    Adhikari, Badri

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch. In this chapter, we briefly discuss many elements of protein residue–residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  6. Modeling methodology for the accurate and prompt prediction of symptomatic events in chronic diseases.

    PubMed

    Pagán, Josué; Risco-Martín, José L; Moya, José M; Ayala, José L

    2016-08-01

    Prediction of symptomatic crises in chronic diseases allows to take decisions before the symptoms occur, such as the intake of drugs to avoid the symptoms or the activation of medical alarms. The prediction horizon is in this case an important parameter in order to fulfill the pharmacokinetics of medications, or the time response of medical services. This paper presents a study about the prediction limits of a chronic disease with symptomatic crises: the migraine. For that purpose, this work develops a methodology to build predictive migraine models and to improve these predictions beyond the limits of the initial models. The maximum prediction horizon is analyzed, and its dependency on the selected features is studied. A strategy for model selection is proposed to tackle the trade off between conservative but robust predictive models, with respect to less accurate predictions with higher horizons. The obtained results show a prediction horizon close to 40min, which is in the time range of the drug pharmacokinetics. Experiments have been performed in a realistic scenario where input data have been acquired in an ambulatory clinical study by the deployment of a non-intrusive Wireless Body Sensor Network. Our results provide an effective methodology for the selection of the future horizon in the development of prediction algorithms for diseases experiencing symptomatic crises. PMID:27260782

  7. Mechanism for accurate, protein-assisted DNA annealing by Deinococcus radiodurans DdrB.

    PubMed

    Sugiman-Marangos, Seiji N; Weiss, Yoni M; Junop, Murray S

    2016-04-19

    Accurate pairing of DNA strands is essential for repair of DNA double-strand breaks (DSBs). How cells achieve accurate annealing when large regions of single-strand DNA are unpaired has remained unclear despite many efforts focused on understanding proteins, which mediate this process. Here we report the crystal structure of a single-strand annealing protein [DdrB (DNA damage response B)] in complex with a partially annealed DNA intermediate to 2.2 Å. This structure and supporting biochemical data reveal a mechanism for accurate annealing involving DdrB-mediated proofreading of strand complementarity. DdrB promotes high-fidelity annealing by constraining specific bases from unauthorized association and only releases annealed duplex when bound strands are fully complementary. To our knowledge, this mechanism provides the first understanding for how cells achieve accurate, protein-assisted strand annealing under biological conditions that would otherwise favor misannealing.

  8. MUFOLD: A new solution for protein 3D structure prediction

    PubMed Central

    Zhang, Jingfen; Wang, Qingguo; Barz, Bogdan; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2010-01-01

    There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8. PMID:19927325

  9. A Single Linear Prediction Filter that Accurately Predicts the AL Index

    NASA Astrophysics Data System (ADS)

    McPherron, R. L.; Chu, X.

    2015-12-01

    The AL index is a measure of the strength of the westward electrojet flowing along the auroral oval. It has two components: one from the global DP-2 current system and a second from the DP-1 current that is more localized near midnight. It is generally believed that the index a very poor measure of these currents because of its dependence on the distance of stations from the source of the two currents. In fact over season and solar cycle the coupling strength defined as the steady state ratio of the output AL to the input coupling function varies by a factor of four. There are four factors that lead to this variation. First is the equinoctial effect that modulates coupling strength with peaks (strongest coupling) at the equinoxes. Second is the saturation of the polar cap potential which decreases coupling strength as the strength of the driver increases. Since saturation occurs more frequently at solar maximum we obtain the result that maximum coupling strength occurs at equinox at solar minimum. A third factor is ionospheric conductivity with stronger coupling at summer solstice as compared to winter. The fourth factor is the definition of a solar wind coupling function appropriate to a given index. We have developed an optimum coupling function depending on solar wind speed, density, transverse magnetic field, and IMF clock angle which is better than previous functions. Using this we have determined the seasonal variation of coupling strength and developed an inverse function that modulates the optimum coupling function so that all seasonal variation is removed. In a similar manner we have determined the dependence of coupling strength on solar wind driver strength. The inverse of this function is used to scale a linear prediction filter thus eliminating the dependence on driver strength. Our result is a single linear filter that is adjusted in a nonlinear manner by driver strength and an optimum coupling function that is seasonal modulated. Together this

  10. A review of the kinetic detail required for accurate predictions of normal shock waves

    NASA Technical Reports Server (NTRS)

    Muntz, E. P.; Erwin, Daniel A.; Pham-Van-diep, Gerald C.

    1991-01-01

    Several aspects of the kinetic models used in the collision phase of Monte Carlo direct simulations have been studied. Accurate molecular velocity distribution function predictions require a significantly increased number of computational cells in one maximum slope shock thickness, compared to predictions of macroscopic properties. The shape of the highly repulsive portion of the interatomic potential for argon is not well modeled by conventional interatomic potentials; this portion of the potential controls high Mach number shock thickness predictions, indicating that the specification of the energetic repulsive portion of interatomic or intermolecular potentials must be chosen with care for correct modeling of nonequilibrium flows at high temperatures. It has been shown for inverse power potentials that the assumption of variable hard sphere scattering provides accurate predictions of the macroscopic properties in shock waves, by comparison with simulations in which differential scattering is employed in the collision phase. On the other hand, velocity distribution functions are not well predicted by the variable hard sphere scattering model for softer potentials at higher Mach numbers.

  11. Can phenological models predict tree phenology accurately under climate change conditions?

    NASA Astrophysics Data System (ADS)

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean Michel; García de Cortázar-Atauri, Inaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2014-05-01

    The onset of the growing season of trees has been globally earlier by 2.3 days/decade during the last 50 years because of global warming and this trend is predicted to continue according to climate forecast. The effect of temperature on plant phenology is however not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud dormancy, and on the other hand higher temperatures are necessary to promote bud cells growth afterwards. Increasing phenological changes in temperate woody species have strong impacts on forest trees distribution and productivity, as well as crops cultivation areas. Accurate predictions of trees phenology are therefore a prerequisite to understand and foresee the impacts of climate change on forests and agrosystems. Different process-based models have been developed in the last two decades to predict the date of budburst or flowering of woody species. They are two main families: (1) one-phase models which consider only the ecodormancy phase and make the assumption that endodormancy is always broken before adequate climatic conditions for cell growth occur; and (2) two-phase models which consider both the endodormancy and ecodormancy phases and predict a date of dormancy break which varies from year to year. So far, one-phase models have been able to predict accurately tree bud break and flowering under historical climate. However, because they do not consider what happens prior to ecodormancy, and especially the possible negative effect of winter temperature warming on dormancy break, it seems unlikely that they can provide accurate predictions in future climate conditions. It is indeed well known that a lack of low temperature results in abnormal pattern of bud break and development in temperate fruit trees. An accurate modelling of the dormancy break date has thus become a major issue in phenology modelling. Two-phases phenological models predict that global warming should delay

  12. Protein function prediction based on data fusion and functional interrelationship.

    PubMed

    Meng, Jun; Wekesa, Jael-Sanyanda; Shi, Guan-Li; Luan, Yu-Shi

    2016-04-01

    One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods. PMID:26869536

  13. Protein structure prediction using hybrid AI methods

    SciTech Connect

    Guan, X.; Mural, R.J.; Uberbacher, E.C.

    1993-11-01

    This paper describes a new approach for predicting protein structures based on Artificial Intelligence methods and genetic algorithms. We combine nearest neighbor searching algorithms, neural networks, heuristic rules and genetic algorithms to form an integrated system to predict protein structures from their primary amino acid sequences. First we describe our methods and how they are integrated, and then apply our methods to several protein sequences. The results are very close to the real structures obtained by crystallography. Parallel genetic algorithms are also implemented.

  14. Can phenological models predict tree phenology accurately in the future? The unrevealed hurdle of endodormancy break.

    PubMed

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean-Michel; García de Cortázar-Atauri, Iñaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2016-10-01

    The onset of the growing season of trees has been earlier by 2.3 days per decade during the last 40 years in temperate Europe because of global warming. The effect of temperature on plant phenology is, however, not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud endodormancy, and, on the other hand, higher temperatures are necessary to promote bud cell growth afterward. Different process-based models have been developed in the last decades to predict the date of budbreak of woody species. They predict that global warming should delay or compromise endodormancy break at the species equatorward range limits leading to a delay or even impossibility to flower or set new leaves. These models are classically parameterized with flowering or budbreak dates only, with no information on the endodormancy break date because this information is very scarce. Here, we evaluated the efficiency of a set of phenological models to accurately predict the endodormancy break dates of three fruit trees. Our results show that models calibrated solely with budbreak dates usually do not accurately predict the endodormancy break date. Providing endodormancy break date for the model parameterization results in much more accurate prediction of this latter, with, however, a higher error than that on budbreak dates. Most importantly, we show that models not calibrated with endodormancy break dates can generate large discrepancies in forecasted budbreak dates when using climate scenarios as compared to models calibrated with endodormancy break dates. This discrepancy increases with mean annual temperature and is therefore the strongest after 2050 in the southernmost regions. Our results claim for the urgent need of massive measurements of endodormancy break dates in forest and fruit trees to yield more robust projections of phenological changes in a near future. PMID:27272707

  15. Can phenological models predict tree phenology accurately in the future? The unrevealed hurdle of endodormancy break.

    PubMed

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean-Michel; García de Cortázar-Atauri, Iñaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2016-10-01

    The onset of the growing season of trees has been earlier by 2.3 days per decade during the last 40 years in temperate Europe because of global warming. The effect of temperature on plant phenology is, however, not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud endodormancy, and, on the other hand, higher temperatures are necessary to promote bud cell growth afterward. Different process-based models have been developed in the last decades to predict the date of budbreak of woody species. They predict that global warming should delay or compromise endodormancy break at the species equatorward range limits leading to a delay or even impossibility to flower or set new leaves. These models are classically parameterized with flowering or budbreak dates only, with no information on the endodormancy break date because this information is very scarce. Here, we evaluated the efficiency of a set of phenological models to accurately predict the endodormancy break dates of three fruit trees. Our results show that models calibrated solely with budbreak dates usually do not accurately predict the endodormancy break date. Providing endodormancy break date for the model parameterization results in much more accurate prediction of this latter, with, however, a higher error than that on budbreak dates. Most importantly, we show that models not calibrated with endodormancy break dates can generate large discrepancies in forecasted budbreak dates when using climate scenarios as compared to models calibrated with endodormancy break dates. This discrepancy increases with mean annual temperature and is therefore the strongest after 2050 in the southernmost regions. Our results claim for the urgent need of massive measurements of endodormancy break dates in forest and fruit trees to yield more robust projections of phenological changes in a near future.

  16. Predicting the Dynamics of Protein Abundance

    PubMed Central

    Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael

    2014-01-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation

  17. Accurate similarity index based on activity and connectivity of node for link prediction

    NASA Astrophysics Data System (ADS)

    Li, Longjie; Qian, Lvjian; Wang, Xiaoping; Luo, Shishun; Chen, Xiaoyun

    2015-05-01

    Recent years have witnessed the increasing of available network data; however, much of those data is incomplete. Link prediction, which can find the missing links of a network, plays an important role in the research and analysis of complex networks. Based on the assumption that two unconnected nodes which are highly similar are very likely to have an interaction, most of the existing algorithms solve the link prediction problem by computing nodes' similarities. The fundamental requirement of those algorithms is accurate and effective similarity indices. In this paper, we propose a new similarity index, namely similarity based on activity and connectivity (SAC), which performs link prediction more accurately. To compute the similarity between two nodes, this index employs the average activity of these two nodes in their common neighborhood and the connectivities between them and their common neighbors. The higher the average activity is and the stronger the connectivities are, the more similar the two nodes are. The proposed index not only commendably distinguishes the contributions of paths but also incorporates the influence of endpoints. Therefore, it can achieve a better predicting result. To verify the performance of SAC, we conduct experiments on 10 real-world networks. Experimental results demonstrate that SAC outperforms the compared baselines.

  18. Accurate prediction of the linear viscoelastic properties of highly entangled mono and bidisperse polymer melts.

    PubMed

    Stephanou, Pavlos S; Mavrantzas, Vlasis G

    2014-06-01

    We present a hierarchical computational methodology which permits the accurate prediction of the linear viscoelastic properties of entangled polymer melts directly from the chemical structure, chemical composition, and molecular architecture of the constituent chains. The method entails three steps: execution of long molecular dynamics simulations with moderately entangled polymer melts, self-consistent mapping of the accumulated trajectories onto a tube model and parameterization or fine-tuning of the model on the basis of detailed simulation data, and use of the modified tube model to predict the linear viscoelastic properties of significantly higher molecular weight (MW) melts of the same polymer. Predictions are reported for the zero-shear-rate viscosity η0 and the spectra of storage G'(ω) and loss G″(ω) moduli for several mono and bidisperse cis- and trans-1,4 polybutadiene melts as well as for their MW dependence, and are found to be in remarkable agreement with experimentally measured rheological data. PMID:24908037

  19. Accurate prediction of the linear viscoelastic properties of highly entangled mono and bidisperse polymer melts

    NASA Astrophysics Data System (ADS)

    Stephanou, Pavlos S.; Mavrantzas, Vlasis G.

    2014-06-01

    We present a hierarchical computational methodology which permits the accurate prediction of the linear viscoelastic properties of entangled polymer melts directly from the chemical structure, chemical composition, and molecular architecture of the constituent chains. The method entails three steps: execution of long molecular dynamics simulations with moderately entangled polymer melts, self-consistent mapping of the accumulated trajectories onto a tube model and parameterization or fine-tuning of the model on the basis of detailed simulation data, and use of the modified tube model to predict the linear viscoelastic properties of significantly higher molecular weight (MW) melts of the same polymer. Predictions are reported for the zero-shear-rate viscosity η0 and the spectra of storage G'(ω) and loss G″(ω) moduli for several mono and bidisperse cis- and trans-1,4 polybutadiene melts as well as for their MW dependence, and are found to be in remarkable agreement with experimentally measured rheological data.

  20. Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees.

    PubMed

    Groussin, Mathieu; Hobbs, Joanne K; Szöllősi, Gergely J; Gribaldo, Simonetta; Arcus, Vickery L; Gouy, Manolo

    2015-01-01

    The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.

  1. Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees.

    PubMed

    Groussin, Mathieu; Hobbs, Joanne K; Szöllősi, Gergely J; Gribaldo, Simonetta; Arcus, Vickery L; Gouy, Manolo

    2015-01-01

    The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify. PMID:25371435

  2. Prediction of Accurate Thermochemistry of Medium and Large Sized Radicals Using Connectivity-Based Hierarchy (CBH).

    PubMed

    Sengupta, Arkajyoti; Raghavachari, Krishnan

    2014-10-14

    Accurate modeling of the chemical reactions in many diverse areas such as combustion, photochemistry, or atmospheric chemistry strongly depends on the availability of thermochemical information of the radicals involved. However, accurate thermochemical investigations of radical systems using state of the art composite methods have mostly been restricted to the study of hydrocarbon radicals of modest size. In an alternative approach, systematic error-canceling thermochemical hierarchy of reaction schemes can be applied to yield accurate results for such systems. In this work, we have extended our connectivity-based hierarchy (CBH) method to the investigation of radical systems. We have calibrated our method using a test set of 30 medium sized radicals to evaluate their heats of formation. The CBH-rad30 test set contains radicals containing diverse functional groups as well as cyclic systems. We demonstrate that the sophisticated error-canceling isoatomic scheme (CBH-2) with modest levels of theory is adequate to provide heats of formation accurate to ∼1.5 kcal/mol. Finally, we predict heats of formation of 19 other large and medium sized radicals for which the accuracy of available heats of formation are less well-known. PMID:26588131

  3. The MULTICOM protein tertiary structure prediction system.

    PubMed

    Li, Jilong; Bhattacharya, Debswapna; Cao, Renzhi; Adhikari, Badri; Deng, Xin; Eickholt, Jesse; Cheng, Jianlin

    2014-01-01

    With the expansion of genomics and proteomics data aided by the rapid progress of next-generation sequencing technologies, computational prediction of protein three-dimensional structure is an essential part of modern structural genomics initiatives. Prediction of protein structure through understanding of the theories behind protein sequence-structure relationship, however, remains one of the most challenging problems in contemporary life sciences. Here, we describe MULTICOM, a multi-level combination technique, intended to predict moderate- to high-resolution structure of a protein through a novel approach of combining multiple sources of complementary information derived from the experimentally solved protein structures in the Protein Data Bank. The MULTICOM web server is freely available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  4. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    SciTech Connect

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.; Arkin, Adam P.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

  5. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space.

    PubMed

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O Anatole; Müller, Klaus-Robert; Tkatchenko, Alexandre

    2015-06-18

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

  6. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    SciTech Connect

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O. Anatole; Müller, Klaus -Robert; Tkatchenko, Alexandre

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

  7. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    DOE PAGES

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O. Anatole; Müller, Klaus -Robert; Tkatchenko, Alexandre

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstratemore » prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.« less

  8. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

    PubMed Central

    2015-01-01

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies. PMID:26113956

  9. Development and Validation of a Multidisciplinary Tool for Accurate and Efficient Rotorcraft Noise Prediction (MUTE)

    NASA Technical Reports Server (NTRS)

    Liu, Yi; Anusonti-Inthra, Phuriwat; Diskin, Boris

    2011-01-01

    A physics-based, systematically coupled, multidisciplinary prediction tool (MUTE) for rotorcraft noise was developed and validated with a wide range of flight configurations and conditions. MUTE is an aggregation of multidisciplinary computational tools that accurately and efficiently model the physics of the source of rotorcraft noise, and predict the noise at far-field observer locations. It uses systematic coupling approaches among multiple disciplines including Computational Fluid Dynamics (CFD), Computational Structural Dynamics (CSD), and high fidelity acoustics. Within MUTE, advanced high-order CFD tools are used around the rotor blade to predict the transonic flow (shock wave) effects, which generate the high-speed impulsive noise. Predictions of the blade-vortex interaction noise in low speed flight are also improved by using the Particle Vortex Transport Method (PVTM), which preserves the wake flow details required for blade/wake and fuselage/wake interactions. The accuracy of the source noise prediction is further improved by utilizing a coupling approach between CFD and CSD, so that the effects of key structural dynamics, elastic blade deformations, and trim solutions are correctly represented in the analysis. The blade loading information and/or the flow field parameters around the rotor blade predicted by the CFD/CSD coupling approach are used to predict the acoustic signatures at far-field observer locations with a high-fidelity noise propagation code (WOPWOP3). The predicted results from the MUTE tool for rotor blade aerodynamic loading and far-field acoustic signatures are compared and validated with a variation of experimental data sets, such as UH60-A data, DNW test data and HART II test data.

  10. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    PubMed

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.

  11. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations

    PubMed Central

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-01-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-‘one-click’ experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/. PMID:26894674

  12. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    PubMed

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/. PMID:26894674

  13. Accurate Prediction of Severe Allergic Reactions by a Small Set of Environmental Parameters (NDVI, Temperature)

    PubMed Central

    Andrianaki, Maria; Azariadis, Kalliopi; Kampouri, Errika; Theodoropoulou, Katerina; Lavrentaki, Katerina; Kastrinakis, Stelios; Kampa, Marilena; Agouridakis, Panagiotis; Pirintsos, Stergios; Castanas, Elias

    2015-01-01

    Severe allergic reactions of unknown etiology,necessitating a hospital visit, have an important impact in the life of affected individuals and impose a major economic burden to societies. The prediction of clinically severe allergic reactions would be of great importance, but current attempts have been limited by the lack of a well-founded applicable methodology and the wide spatiotemporal distribution of allergic reactions. The valid prediction of severe allergies (and especially those needing hospital treatment) in a region, could alert health authorities and implicated individuals to take appropriate preemptive measures. In the present report we have collecterd visits for serious allergic reactions of unknown etiology from two major hospitals in the island of Crete, for two distinct time periods (validation and test sets). We have used the Normalized Difference Vegetation Index (NDVI), a satellite-based, freely available measurement, which is an indicator of live green vegetation at a given geographic area, and a set of meteorological data to develop a model capable of describing and predicting severe allergic reaction frequency. Our analysis has retained NDVI and temperature as accurate identifiers and predictors of increased hospital severe allergic reactions visits. Our approach may contribute towards the development of satellite-based modules, for the prediction of severe allergic reactions in specific, well-defined geographical areas. It could also probably be used for the prediction of other environment related diseases and conditions. PMID:25794106

  14. Microstructure-Dependent Gas Adsorption: Accurate Predictions of Methane Uptake in Nanoporous Carbons

    SciTech Connect

    Ihm, Yungok; Cooper, Valentino R; Gallego, Nidia C; Contescu, Cristian I; Morris, James R

    2014-01-01

    We demonstrate a successful, efficient framework for predicting gas adsorption properties in real materials based on first-principles calculations, with a specific comparison of experiment and theory for methane adsorption in activated carbons. These carbon materials have different pore size distributions, leading to a variety of uptake characteristics. Utilizing these distributions, we accurately predict experimental uptakes and heats of adsorption without empirical potentials or lengthy simulations. We demonstrate that materials with smaller pores have higher heats of adsorption, leading to a higher gas density in these pores. This pore-size dependence must be accounted for, in order to predict and understand the adsorption behavior. The theoretical approach combines: (1) ab initio calculations with a van der Waals density functional to determine adsorbent-adsorbate interactions, and (2) a thermodynamic method that predicts equilibrium adsorption densities by directly incorporating the calculated potential energy surface in a slit pore model. The predicted uptake at P=20 bar and T=298 K is in excellent agreement for all five activated carbon materials used. This approach uses only the pore-size distribution as an input, with no fitting parameters or empirical adsorbent-adsorbate interactions, and thus can be easily applied to other adsorbent-adsorbate combinations.

  15. Accurate verification of the conserved-vector-current and standard-model predictions

    SciTech Connect

    Sirlin, A.; Zucchini, R.

    1986-10-20

    An approximate analytic calculation of O(Z..cap alpha../sup 2/) corrections to Fermi decays is presented. When the analysis of Koslowsky et al. is modified to take into account the new results, it is found that each of the eight accurately studied scrFt values differs from the average by approx. <1sigma, thus significantly improving the comparison of experiments with conserved-vector-current predictions. The new scrFt values are lower than before, which also brings experiments into very good agreement with the three-generation standard model, at the level of its quantum corrections.

  16. Signature Product Code for Predicting Protein-Protein Interactions

    SciTech Connect

    Martin, Shawn B.; Brown, William M.

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictions about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.

  17. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli

    PubMed Central

    Kim, Minseung; Rai, Navneet; Zorraquino, Violeta; Tagkopoulos, Ilias

    2016-01-01

    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery. PMID:27713404

  18. CRYSTALP2: sequence-based protein crystallization propensity prediction

    PubMed Central

    Kurgan, Lukasz; Razib, Ali A; Aghakhani, Sara; Dick, Scott; Mizianty, Marcin; Jahandideh, Samad

    2009-01-01

    Background Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. Results A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from . Conclusion CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining

  19. PREDITOR: a web server for predicting protein torsion angle restraints

    PubMed Central

    Berjanskii, Mark V.; Neal, Stephen; Wishart, David S.

    2006-01-01

    Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this process involves the determination of torsion angle restraints including phi, psi, omega and chi angles. Most methods require many days of additional experiments, painstaking measurements or complex calculations. Here we wish to describe a web server, called PREDITOR, which greatly accelerates and simplifies this task. PREDITOR accepts sequence and/or chemical shift data as input and generates torsion angle predictions (with predicted errors) for phi, psi, omega and chi-1 angles. PREDITOR combines sequence alignment methods with advanced chemical shift analysis techniques to generate its torsion angle predictions. The method is fast (<40 s per protein) and accurate, with 88% of phi/psi predictions being within 30° of the correct values, 84% of chi-1 predictions being correct and 99.97% of omega angles being correct. PREDITOR is 35 times faster and up to 20% more accurate than any existing method. PREDITOR also provides accurate assessments of the torsion angle errors so that the torsion angle constraints can be readily fed into standard structure refinement programs, such as CNS, XPLOR, AMBER and CYANA. Other unique features to PREDITOR include dihedral angle prediction via PDB structure mapping, automated chemical shift re-referencing (to improve accuracy), prediction of proline cis/trans states and a simple user interface. The PREDITOR website is located at: . PMID:16845087

  20. Predicting the fission yeast protein interaction network.

    PubMed

    Pancaldi, Vera; Saraç, Omer S; Rallis, Charalampos; McLean, Janel R; Převorovský, Martin; Gould, Kathleen; Beyer, Andreas; Bähler, Jürg

    2012-04-01

    A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein-protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70-80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt).

  1. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  2. Predicting protein-peptide interactions from scratch

    NASA Astrophysics Data System (ADS)

    Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin; Zou lab Team

    Protein-peptide interactions play an important role in many cellular processes. The ability to predict protein-peptide complex structures is valuable for mechanistic investigation and therapeutic development. Due to the high flexibility of peptides and lack of templates for homologous modeling, predicting protein-peptide complex structures is extremely challenging. Recently, we have developed a novel docking framework for protein-peptide structure prediction. Specifically, given the sequence of a peptide and a 3D structure of the protein, initial conformations of the peptide are built through protein threading. Then, the peptide is globally and flexibly docked onto the protein using a novel iterative approach. Finally, the sampled modes are scored and ranked by a statistical potential-based energy scoring function that was derived for protein-peptide interactions from statistical mechanics principles. Our docking methodology has been tested on the Peptidb database and compared with other protein-peptide docking methods. Systematic analysis shows significantly improved results compared to the performances of the existing methods. Our method is computationally efficient and suitable for large-scale applications. Nsf CAREER Award 0953839 (XZ) NIH R01GM109980 (XZ).

  3. Year 2 Report: Protein Function Prediction Platform

    SciTech Connect

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  4. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-12-21

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.

  5. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-01-01

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate. PMID:26782400

  6. Integrating multiple networks for protein function prediction

    PubMed Central

    2015-01-01

    Background High throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction. Results We address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. Conclusion MNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request. PMID:25707434

  7. ILT based defect simulation of inspection images accurately predicts mask defect printability on wafer

    NASA Astrophysics Data System (ADS)

    Deep, Prakash; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2016-05-01

    At advanced technology nodes mask complexity has been increased because of large-scale use of resolution enhancement technologies (RET) which includes Optical Proximity Correction (OPC), Inverse Lithography Technology (ILT) and Source Mask Optimization (SMO). The number of defects detected during inspection of such mask increased drastically and differentiation of critical and non-critical defects are more challenging, complex and time consuming. Because of significant defectivity of EUVL masks and non-availability of actinic inspection, it is important and also challenging to predict the criticality of defects for printability on wafer. This is one of the significant barriers for the adoption of EUVL for semiconductor manufacturing. Techniques to decide criticality of defects from images captured using non actinic inspection images is desired till actinic inspection is not available. High resolution inspection of photomask images detects many defects which are used for process and mask qualification. Repairing all defects is not practical and probably not required, however it's imperative to know which defects are severe enough to impact wafer before repair. Additionally, wafer printability check is always desired after repairing a defect. AIMSTM review is the industry standard for this, however doing AIMSTM review for all defects is expensive and very time consuming. Fast, accurate and an economical mechanism is desired which can predict defect printability on wafer accurately and quickly from images captured using high resolution inspection machine. Predicting defect printability from such images is challenging due to the fact that the high resolution images do not correlate with actual mask contours. The challenge is increased due to use of different optical condition during inspection other than actual scanner condition, and defects found in such images do not have correlation with actual impact on wafer. Our automated defect simulation tool predicts

  8. Chemical shift prediction for denatured proteins.

    PubMed

    Prestegard, James H; Sahu, Sarata C; Nkari, Wendy K; Morris, Laura C; Live, David; Gruta, Christian

    2013-02-01

    While chemical shift prediction has played an important role in aspects of protein NMR that include identification of secondary structure, generation of torsion angle constraints for structure determination, and assignment of resonances in spectra of intrinsically disordered proteins, interest has arisen more recently in using it in alternate assignment strategies for crosspeaks in (1)H-(15)N HSQC spectra of sparsely labeled proteins. One such approach involves correlation of crosspeaks in the spectrum of the native protein with those observed in the spectrum of the denatured protein, followed by assignment of the peaks in the latter spectrum. As in the case of disordered proteins, predicted chemical shifts can aid in these assignments. Some previously developed empirical formulas for chemical shift prediction have depended on basis data sets of 20 pentapeptides. In each case the central residue was varied among the 20 amino common acids, with the flanking residues held constant throughout the given series. However, previous choices of solvent conditions and flanking residues make the parameters in these formulas less than ideal for general application to denatured proteins. Here, we report (1)H and (15)N shifts for a set of alanine based pentapeptides under the low pH urea denaturing conditions that are more appropriate for sparse label assignments. New parameters have been derived and a Perl script was created to facilitate comparison with other parameter sets. A small, but significant, improvement in shift predictions for denatured ubiquitin is demonstrated.

  9. Signature Product Code for Predicting Protein-Protein Interactions

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictionsmore » about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.« less

  10. CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

    PubMed Central

    Karim, Rezaul; Aziz, Mohd. Momin Al; Shatabda, Swakkhar; Rahman, M. Sohel; Mia, Md. Abul Kashem; Zaman, Farhana; Rakin, Salman

    2015-01-01

    The number of entries in a structural database of proteins is increasing day by day. Methods for retrieving protein tertiary structures from such a large database have turn out to be the key to comparative analysis of structures that plays an important role to understand proteins and their functions. In this paper, we present fast and accurate methods for the retrieval of proteins having tertiary structures similar to a query protein from a large database. Our proposed methods borrow ideas from the field of computer vision. The speed and accuracy of our methods come from the two newly introduced features- the co-occurrence matrix of the oriented gradient and pyramid histogram of oriented gradient- and the use of Euclidean distance as the distance measure. Experimental results clearly indicate the superiority of our approach in both running time and accuracy. Our method is readily available for use from this website: http://research.buet.ac.bd:8080/Comograd/. PMID:26293226

  11. Toward an Accurate Prediction of the Arrival Time of Geomagnetic-Effective Coronal Mass Ejections

    NASA Astrophysics Data System (ADS)

    Shi, T.; Wang, Y.; Wan, L.; Cheng, X.; Ding, M.; Zhang, J.

    2015-12-01

    Accurately predicting the arrival of coronal mass ejections (CMEs) to the Earth based on remote images is of critical significance for the study of space weather. Here we make a statistical study of 21 Earth-directed CMEs, specifically exploring the relationship between CME initial speeds and transit times. The initial speed of a CME is obtained by fitting the CME with the Graduated Cylindrical Shell model and is thus free of projection effects. We then use the drag force model to fit results of the transit time versus the initial speed. By adopting different drag regimes, i.e., the viscous, aerodynamics, and hybrid regimes, we get similar results, with a least mean estimation error of the hybrid model of 12.9 hr. CMEs with a propagation angle (the angle between the propagation direction and the Sun-Earth line) larger than their half-angular widths arrive at the Earth with an angular deviation caused by factors other than the radial solar wind drag. The drag force model cannot be reliably applied to such events. If we exclude these events in the sample, the prediction accuracy can be improved, i.e., the estimation error reduces to 6.8 hr. This work suggests that it is viable to predict the arrival time of CMEs to the Earth based on the initial parameters with fairly good accuracy. Thus, it provides a method of forecasting space weather 1-5 days following the occurrence of CMEs.

  12. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water

    SciTech Connect

    Shvab, I.; Sadus, Richard J.

    2013-11-21

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g/cm{sup 3} for a wide range of temperatures (298–650 K) and pressures (0.1–700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC/E and TIP4P/2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC/E and TIP4P/2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  13. VORFFIP-driven dock: V-D2OCK, a fast and accurate protein docking strategy.

    PubMed

    Segura, Joan; Marín-López, Manuel Alejandro; Jones, Pamela F; Oliva, Baldo; Fernandez-Fuentes, Narcis

    2015-01-01

    The experimental determination of the structure of protein complexes cannot keep pace with the generation of interactomic data, hence resulting in an ever-expanding gap. As the structural details of protein complexes are central to a full understanding of the function and dynamics of the cell machinery, alternative strategies are needed to circumvent the bottleneck in structure determination. Computational protein docking is a valid and valuable approach to model the structure of protein complexes. In this work, we describe a novel computational strategy to predict the structure of protein complexes based on data-driven docking: VORFFIP-driven dock (V-D2OCK). This new approach makes use of our newly described method to predict functional sites in protein structures, VORFFIP, to define the region to be sampled during docking and structural clustering to reduce the number of models to be examined by users. V-D2OCK has been benchmarked using a validated and diverse set of protein complexes and compared to a state-of-art docking method. The speed and accuracy compared to contemporary tools justifies the potential use of VD2OCK for high-throughput, genome-wide, protein docking. Finally, we have developed a web interface that allows users to browser and visualize V-D2OCK predictions from the convenience of their web-browsers.

  14. PSI: A Comprehensive and Integrative Approach for Accurate Plant Subcellular Localization Prediction

    PubMed Central

    Chen, Ming

    2013-01-01

    Predicting the subcellular localization of proteins conquers the major drawbacks of high-throughput localization experiments that are costly and time-consuming. However, current subcellular localization predictors are limited in scope and accuracy. In particular, most predictors perform well on certain locations or with certain data sets while poorly on others. Here, we present PSI, a novel high accuracy web server for plant subcellular localization prediction. PSI derives the wisdom of multiple specialized predictors via a joint-approach of group decision making strategy and machine learning methods to give an integrated best result. The overall accuracy obtained (up to 93.4%) was higher than best individual (CELLO) by ∼10.7%. The precision of each predicable subcellular location (more than 80%) far exceeds that of the individual predictors. It can also deal with multi-localization proteins. PSI is expected to be a powerful tool in protein location engineering as well as in plant sciences, while the strategy employed could be applied to other integrative problems. A user-friendly web server, PSI, has been developed for free access at http://bis.zju.edu.cn/psi/. PMID:24194827

  15. Accurate single-sequence prediction of solvent accessible surface area using local and global features.

    PubMed

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-11-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  16. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  17. JPred4: a protein secondary structure prediction server.

    PubMed

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J

    2015-07-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  18. JPred4: a protein secondary structure prediction server

    PubMed Central

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J.

    2015-01-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  19. Reduced alphabet for protein folding prediction.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-04-01

    What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design.

  20. Direct Pressure Monitoring Accurately Predicts Pulmonary Vein Occlusion During Cryoballoon Ablation

    PubMed Central

    Kosmidou, Ioanna; Wooden, Shannnon; Jones, Brian; Deering, Thomas; Wickliffe, Andrew; Dan, Dan

    2013-01-01

    Cryoballoon ablation (CBA) is an established therapy for atrial fibrillation (AF). Pulmonary vein (PV) occlusion is essential for achieving antral contact and PV isolation and is typically assessed by contrast injection. We present a novel method of direct pressure monitoring for assessment of PV occlusion. Transcatheter pressure is monitored during balloon advancement to the PV antrum. Pressure is recorded via a single pressure transducer connected to the inner lumen of the cryoballoon. Pressure curve characteristics are used to assess occlusion in conjunction with fluoroscopic or intracardiac echocardiography (ICE) guidance. PV occlusion is confirmed when loss of typical left atrial (LA) pressure waveform is observed with recordings of PA pressure characteristics (no A wave and rapid V wave upstroke). Complete pulmonary vein occlusion as assessed with this technique has been confirmed with concurrent contrast utilization during the initial testing of the technique and has been shown to be highly accurate and readily reproducible. We evaluated the efficacy of this novel technique in 35 patients. A total of 128 veins were assessed for occlusion with the cryoballoon utilizing the pressure monitoring technique; occlusive pressure was demonstrated in 113 veins with resultant successful pulmonary vein isolation in 111 veins (98.2%). Occlusion was confirmed with subsequent contrast injection during the initial ten procedures, after which contrast utilization was rapidly reduced or eliminated given the highly accurate identification of occlusive pressure waveform with limited initial training. Verification of PV occlusive pressure during CBA is a novel approach to assessing effective PV occlusion and it accurately predicts electrical isolation. Utilization of this method results in significant decrease in fluoroscopy time and volume of contrast. PMID:23485956

  1. A fast and accurate method to predict 2D and 3D aerodynamic boundary layer flows

    NASA Astrophysics Data System (ADS)

    Bijleveld, H. A.; Veldman, A. E. P.

    2014-12-01

    A quasi-simultaneous interaction method is applied to predict 2D and 3D aerodynamic flows. This method is suitable for offshore wind turbine design software as it is a very accurate and computationally reasonably cheap method. This study shows the results for a NACA 0012 airfoil. The two applied solvers converge to the experimental values when the grid is refined. We also show that in separation the eigenvalues remain positive thus avoiding the Goldstein singularity at separation. In 3D we show a flow over a dent in which separation occurs. A rotating flat plat is used to show the applicability of the method for rotating flows. The shown capabilities of the method indicate that the quasi-simultaneous interaction method is suitable for design methods for offshore wind turbine blades.

  2. Distance scaling method for accurate prediction of slowly varying magnetic fields in satellite missions

    NASA Astrophysics Data System (ADS)

    Zacharias, Panagiotis P.; Chatzineofytou, Elpida G.; Spantideas, Sotirios T.; Capsalis, Christos N.

    2016-07-01

    In the present work, the determination of the magnetic behavior of localized magnetic sources from near-field measurements is examined. The distance power law of the magnetic field fall-off is used in various cases to accurately predict the magnetic signature of an equipment under test (EUT) consisting of multiple alternating current (AC) magnetic sources. Therefore, parameters concerning the location of the observation points (magnetometers) are studied towards this scope. The results clearly show that these parameters are independent of the EUT's size and layout. Additionally, the techniques developed in the present study enable the placing of the magnetometers close to the EUT, thus achieving high signal-to-noise ratio (SNR). Finally, the proposed method is verified by real measurements, using a mobile phone as an EUT.

  3. Differential contribution of visual and auditory information to accurately predict the direction and rotational motion of a visual stimulus.

    PubMed

    Park, Seoung Hoon; Kim, Seonjin; Kwon, MinHyuk; Christou, Evangelos A

    2016-03-01

    Vision and auditory information are critical for perception and to enhance the ability of an individual to respond accurately to a stimulus. However, it is unknown whether visual and auditory information contribute differentially to identify the direction and rotational motion of the stimulus. The purpose of this study was to determine the ability of an individual to accurately predict the direction and rotational motion of the stimulus based on visual and auditory information. In this study, we recruited 9 expert table-tennis players and used table-tennis service as our experimental model. Participants watched recorded services with different levels of visual and auditory information. The goal was to anticipate the direction of the service (left or right) and the rotational motion of service (topspin, sidespin, or cut). We recorded their responses and quantified the following outcomes: (i) directional accuracy and (ii) rotational motion accuracy. The response accuracy was the accurate predictions relative to the total number of trials. The ability of the participants to predict the direction of the service accurately increased with additional visual information but not with auditory information. In contrast, the ability of the participants to predict the rotational motion of the service accurately increased with the addition of auditory information to visual information but not with additional visual information alone. In conclusion, this finding demonstrates that visual information enhances the ability of an individual to accurately predict the direction of the stimulus, whereas additional auditory information enhances the ability of an individual to accurately predict the rotational motion of stimulus.

  4. In vitro transcription accurately predicts lac repressor phenotype in vivo in Escherichia coli

    PubMed Central

    2014-01-01

    A multitude of studies have looked at the in vivo and in vitro behavior of the lac repressor binding to DNA and effector molecules in order to study transcriptional repression, however these studies are not always reconcilable. Here we use in vitro transcription to directly mimic the in vivo system in order to build a self consistent set of experiments to directly compare in vivo and in vitro genetic repression. A thermodynamic model of the lac repressor binding to operator DNA and effector is used to link DNA occupancy to either normalized in vitro mRNA product or normalized in vivo fluorescence of a regulated gene, YFP. An accurate measurement of repressor, DNA and effector concentrations were made both in vivo and in vitro allowing for direct modeling of the entire thermodynamic equilibrium. In vivo repression profiles are accurately predicted from the given in vitro parameters when molecular crowding is considered. Interestingly, our measured repressor–operator DNA affinity differs significantly from previous in vitro measurements. The literature values are unable to replicate in vivo binding data. We therefore conclude that the repressor-DNA affinity is much weaker than previously thought. This finding would suggest that in vitro techniques that are specifically designed to mimic the in vivo process may be necessary to replicate the native system. PMID:25097824

  5. Measuring solar reflectance Part I: Defining a metric that accurately predicts solar heat gain

    SciTech Connect

    Levinson, Ronnen; Akbari, Hashem; Berdahl, Paul

    2010-05-14

    Solar reflectance can vary with the spectral and angular distributions of incident sunlight, which in turn depend on surface orientation, solar position and atmospheric conditions. A widely used solar reflectance metric based on the ASTM Standard E891 beam-normal solar spectral irradiance underestimates the solar heat gain of a spectrally selective 'cool colored' surface because this irradiance contains a greater fraction of near-infrared light than typically found in ordinary (unconcentrated) global sunlight. At mainland U.S. latitudes, this metric RE891BN can underestimate the annual peak solar heat gain of a typical roof or pavement (slope {le} 5:12 [23{sup o}]) by as much as 89 W m{sup -2}, and underestimate its peak surface temperature by up to 5 K. Using R{sub E891BN} to characterize roofs in a building energy simulation can exaggerate the economic value N of annual cool-roof net energy savings by as much as 23%. We define clear-sky air mass one global horizontal ('AM1GH') solar reflectance R{sub g,0}, a simple and easily measured property that more accurately predicts solar heat gain. R{sub g,0} predicts the annual peak solar heat gain of a roof or pavement to within 2 W m{sup -2}, and overestimates N by no more than 3%. R{sub g,0} is well suited to rating the solar reflectances of roofs, pavements and walls. We show in Part II that R{sub g,0} can be easily and accurately measured with a pyranometer, a solar spectrophotometer or version 6 of the Solar Spectrum Reflectometer.

  6. Measuring solar reflectance - Part I: Defining a metric that accurately predicts solar heat gain

    SciTech Connect

    Levinson, Ronnen; Akbari, Hashem; Berdahl, Paul

    2010-09-15

    Solar reflectance can vary with the spectral and angular distributions of incident sunlight, which in turn depend on surface orientation, solar position and atmospheric conditions. A widely used solar reflectance metric based on the ASTM Standard E891 beam-normal solar spectral irradiance underestimates the solar heat gain of a spectrally selective ''cool colored'' surface because this irradiance contains a greater fraction of near-infrared light than typically found in ordinary (unconcentrated) global sunlight. At mainland US latitudes, this metric R{sub E891BN} can underestimate the annual peak solar heat gain of a typical roof or pavement (slope {<=} 5:12 [23 ]) by as much as 89 W m{sup -2}, and underestimate its peak surface temperature by up to 5 K. Using R{sub E891BN} to characterize roofs in a building energy simulation can exaggerate the economic value N of annual cool roof net energy savings by as much as 23%. We define clear sky air mass one global horizontal (''AM1GH'') solar reflectance R{sub g,0}, a simple and easily measured property that more accurately predicts solar heat gain. R{sub g,0} predicts the annual peak solar heat gain of a roof or pavement to within 2 W m{sup -2}, and overestimates N by no more than 3%. R{sub g,0} is well suited to rating the solar reflectances of roofs, pavements and walls. We show in Part II that R{sub g,0} can be easily and accurately measured with a pyranometer, a solar spectrophotometer or version 6 of the Solar Spectrum Reflectometer. (author)

  7. Using protein binding site prediction to improve protein docking.

    PubMed

    Huang, Bingding; Schroeder, Michael

    2008-10-01

    Predicting protein interaction interfaces and protein complexes are two important related problems. For interface prediction, there are a number of tools, such as PPI-Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. Here, we develop, metaPPI, a meta server for interface prediction. It significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. As shown with Promate, predicted interfaces can be used to improve protein docking. Here, we follow this idea using the meta server instead of individual predictions. We confirm that filtering with predicted interfaces significantly improves candidate generation in rigid-body docking based on shape complementarity. Finally, we show that the initial ranking of candidate solutions in rigid-body docking can be further improved for the class of enzyme-inhibitor complexes by a geometrical scoring which rewards deep pockets. A web server of metaPPI is available at scoppi.tu-dresden.de/metappi. The source code of our docking algorithm BDOCK is also available at www.biotec.tu-dresden.de /approximately bhuang/bdock.

  8. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  9. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  10. PROMALS3D web server for accurate multiple protein sequence and structure alignments.

    PubMed

    Pei, Jimin; Tang, Ming; Grishin, Nick V

    2008-07-01

    Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/. PMID:18503087

  11. Neural network definitions of highly predictable protein secondary structure classes

    SciTech Connect

    Lapedes, A. |; Steeg, E.; Farber, R.

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  12. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina.

    PubMed

    Maturana, Matias I; Apollo, Nicholas V; Hadjinicolaou, Alex E; Garrett, David J; Cloherty, Shaun L; Kameneva, Tatiana; Grayden, David B; Ibbotson, Michael R; Meffin, Hamish

    2016-04-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron's electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy. PMID:27035143

  13. Accurate load prediction by BEM with airfoil data from 3D RANS simulations

    NASA Astrophysics Data System (ADS)

    Schneider, Marc S.; Nitzsche, Jens; Hennings, Holger

    2016-09-01

    In this paper, two methods for the extraction of airfoil coefficients from 3D CFD simulations of a wind turbine rotor are investigated, and these coefficients are used to improve the load prediction of a BEM code. The coefficients are extracted from a number of steady RANS simulations, using either averaging of velocities in annular sections, or an inverse BEM approach for determination of the induction factors in the rotor plane. It is shown that these 3D rotor polars are able to capture the rotational augmentation at the inner part of the blade as well as the load reduction by 3D effects close to the blade tip. They are used as input to a simple BEM code and the results of this BEM with 3D rotor polars are compared to the predictions of BEM with 2D airfoil coefficients plus common empirical corrections for stall delay and tip loss. While BEM with 2D airfoil coefficients produces a very different radial distribution of loads than the RANS simulation, the BEM with 3D rotor polars manages to reproduce the loads from RANS very accurately for a variety of load cases, as long as the blade pitch angle is not too different from the cases from which the polars were extracted.

  14. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina

    PubMed Central

    Maturana, Matias I.; Apollo, Nicholas V.; Hadjinicolaou, Alex E.; Garrett, David J.; Cloherty, Shaun L.; Kameneva, Tatiana; Grayden, David B.; Ibbotson, Michael R.; Meffin, Hamish

    2016-01-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron’s electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy. PMID:27035143

  15. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  16. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  17. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  18. Accurate First-Principles Spectra Predictions for Planetological and Astrophysical Applications at Various T-Conditions

    NASA Astrophysics Data System (ADS)

    Rey, M.; Nikitin, A. V.; Tyuterev, V.

    2014-06-01

    Knowledge of near infrared intensities of rovibrational transitions of polyatomic molecules is essential for the modeling of various planetary atmospheres, brown dwarfs and for other astrophysical applications 1,2,3. For example, to analyze exoplanets, atmospheric models have been developed, thus making the need to provide accurate spectroscopic data. Consequently, the spectral characterization of such planetary objects relies on the necessity of having adequate and reliable molecular data in extreme conditions (temperature, optical path length, pressure). On the other hand, in the modeling of astrophysical opacities, millions of lines are generally involved and the line-by-line extraction is clearly not feasible in laboratory measurements. It is thus suggested that this large amount of data could be interpreted only by reliable theoretical predictions. There exists essentially two theoretical approaches for the computation and prediction of spectra. The first one is based on empirically-fitted effective spectroscopic models. Another way for computing energies, line positions and intensities is based on global variational calculations using ab initio surfaces. They do not yet reach the spectroscopic accuracy stricto sensu but implicitly account for all intramolecular interactions including resonance couplings in a wide spectral range. The final aim of this work is to provide reliable predictions which could be quantitatively accurate with respect to the precision of available observations and as complete as possible. All this thus requires extensive first-principles quantum mechanical calculations essentially based on three necessary ingredients which are (i) accurate intramolecular potential energy surface and dipole moment surface components well-defined in a large range of vibrational displacements and (ii) efficient computational methods combined with suitable choices of coordinates to account for molecular symmetry properties and to achieve a good numerical

  19. Development of a New Model for Accurate Prediction of Cloud Water Deposition on Vegetation

    NASA Astrophysics Data System (ADS)

    Katata, G.; Nagai, H.; Wrzesinsky, T.; Klemm, O.; Eugster, W.; Burkard, R.

    2006-12-01

    Scarcity of water resources in arid and semi-arid areas is of great concern in the light of population growth and food shortages. Several experiments focusing on cloud (fog) water deposition on the land surface suggest that cloud water plays an important role in water resource in such regions. A one-dimensional vegetation model including the process of cloud water deposition on vegetation has been developed to better predict cloud water deposition on the vegetation. New schemes to calculate capture efficiency of leaf, cloud droplet size distribution, and gravitational flux of cloud water were incorporated in the model. Model calculations were compared with the data acquired at the Norway spruce forest at the Waldstein site, Germany. High performance of the model was confirmed by comparisons of calculated net radiation, sensible and latent heat, and cloud water fluxes over the forest with measurements. The present model provided a better prediction of measured turbulent and gravitational fluxes of cloud water over the canopy than the Lovett model, which is a commonly used cloud water deposition model. Detailed calculations of evapotranspiration and of turbulent exchange of heat and water vapor within the canopy and the modifications are necessary for accurate prediction of cloud water deposition. Numerical experiments to examine the dependence of cloud water deposition on the vegetation species (coniferous and broad-leaved trees, flat and cylindrical grasses) and structures (Leaf Area Index (LAI) and canopy height) are performed using the presented model. The results indicate that the differences of leaf shape and size have a large impact on cloud water deposition. Cloud water deposition also varies with the growth of vegetation and seasonal change of LAI. We found that the coniferous trees whose height and LAI are 24 m and 2.0 m2m-2, respectively, produce the largest amount of cloud water deposition in all combinations of vegetation species and structures in the

  20. Prediction and integration of regulatory and protein-protein interactions

    SciTech Connect

    Wichadakul, Duangdao; McDermott, Jason E.; Samudrala, Ram

    2009-04-20

    Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast [1] and worm [2]. Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse [3]. We describe how to compile and handle various formats and identifiers of data sets from different sources, and how to predict the TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein sub-cellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.

  1. Can radiation therapy treatment planning system accurately predict surface doses in postmastectomy radiation therapy patients?

    SciTech Connect

    Wong, Sharon; Back, Michael; Tan, Poh Wee; Lee, Khai Mun; Baggarley, Shaun; Lu, Jaide Jay

    2012-07-01

    Skin doses have been an important factor in the dose prescription for breast radiotherapy. Recent advances in radiotherapy treatment techniques, such as intensity-modulated radiation therapy (IMRT) and new treatment schemes such as hypofractionated breast therapy have made the precise determination of the surface dose necessary. Detailed information of the dose at various depths of the skin is also critical in designing new treatment strategies. The purpose of this work was to assess the accuracy of surface dose calculation by a clinically used treatment planning system and those measured by thermoluminescence dosimeters (TLDs) in a customized chest wall phantom. This study involved the construction of a chest wall phantom for skin dose assessment. Seven TLDs were distributed throughout each right chest wall phantom to give adequate representation of measured radiation doses. Point doses from the CMS Xio Registered-Sign treatment planning system (TPS) were calculated for each relevant TLD positions and results correlated. There were no significant difference between measured absorbed dose by TLD and calculated doses by the TPS (p > 0.05 (1-tailed). Dose accuracy of up to 2.21% was found. The deviations from the calculated absorbed doses were overall larger (3.4%) when wedges and bolus were used. 3D radiotherapy TPS is a useful and accurate tool to assess the accuracy of surface dose. Our studies have shown that radiation treatment accuracy expressed as a comparison between calculated doses (by TPS) and measured doses (by TLD dosimetry) can be accurately predicted for tangential treatment of the chest wall after mastectomy.

  2. Bifunctional Spin Labeling of Muscle Proteins: Accurate Rotational Dynamics, Orientation, and Distance by EPR.

    PubMed

    Thompson, Andrew R; Binder, Benjamin P; McCaffrey, Jesse E; Svensson, Bengt; Thomas, David D

    2015-01-01

    While EPR allows for the characterization of protein structure and function due to its exquisite sensitivity to spin label dynamics, orientation, and distance, these measurements are often limited in sensitivity due to the use of labels that are attached via flexible monofunctional bonds, incurring additional disorder and nanosecond dynamics. In this chapter, we present methods for using a bifunctional spin label (BSL) to measure muscle protein structure and dynamics. We demonstrate that bifunctional attachment eliminates nanosecond internal rotation of the spin label, thereby allowing the accurate measurement of protein backbone rotational dynamics, including microsecond-to-millisecond motions by saturation transfer EPR. BSL also allows for accurate determination of helix orientation and disorder in mechanically and magnetically aligned systems, due to the label's stereospecific attachment. Similarly, labeling with a pair of BSL greatly enhances the resolution and accuracy of distance measurements measured by double electron-electron resonance (DEER). Finally, when BSL is applied to a protein with high helical content in an assembly with high orientational order (e.g., muscle fiber or membrane), two-probe DEER experiments can be combined with single-probe EPR experiments on an oriented sample in a process we call BEER, which has the potential for ab initio high-resolution structure determination. PMID:26477249

  3. Accurate determination of protein methionine oxidation by stable isotope labeling and LC-MS analysis.

    PubMed

    Liu, Hongcheng; Ponniah, Gomathinayagam; Neill, Alyssa; Patel, Rekha; Andrien, Bruce

    2013-12-17

    Methionine (Met) oxidation is a major modification of proteins, which converts Met to Met sulfoxide as the common product. It is challenging to determine the level of Met sulfoxide, because it can be generated during sample preparation and analysis as an artifact. To determine the level of Met sulfoxide in proteins accurately, an isotope labeling and LC-MS peptide mapping method was developed. Met residues in proteins were fully oxidized using hydrogen peroxide enriched with (18)O atoms before sample preparation. Therefore, it was impossible to generate Met sulfoxide as an artifact during sample preparation. The molecular weight difference of 2 Da between Met sulfoxide with the (16)O atom and Met sulfoxide with the (18)O atom was used to differentiate and calculate the level of Met sulfoxide in the sample originally. Using a recombinant monoclonal antibody as a model protein, much lower levels of Met sulfoxide were detected for the two susceptible Met residues with this new method compared to a typical peptide mapping procedure. The results demonstrated efficient elimination of the analytical artifact during LC-MS peptide mapping for the measurement of Met sulfoxide. This method can thus be used when accurate determination of the level of Met sulfoxide is critical.

  4. TIMP2•IGFBP7 biomarker panel accurately predicts acute kidney injury in high-risk surgical patients

    PubMed Central

    Gunnerson, Kyle J.; Shaw, Andrew D.; Chawla, Lakhmir S.; Bihorac, Azra; Al-Khafaji, Ali; Kashani, Kianoush; Lissauer, Matthew; Shi, Jing; Walker, Michael G.; Kellum, John A.

    2016-01-01

    BACKGROUND Acute kidney injury (AKI) is an important complication in surgical patients. Existing biomarkers and clinical prediction models underestimate the risk for developing AKI. We recently reported data from two trials of 728 and 408 critically ill adult patients in whom urinary TIMP2•IGFBP7 (NephroCheck, Astute Medical) was used to identify patients at risk of developing AKI. Here we report a preplanned analysis of surgical patients from both trials to assess whether urinary tissue inhibitor of metalloproteinase 2 (TIMP-2) and insulin-like growth factor–binding protein 7 (IGFBP7) accurately identify surgical patients at risk of developing AKI. STUDY DESIGN We enrolled adult surgical patients at risk for AKI who were admitted to one of 39 intensive care units across Europe and North America. The primary end point was moderate-severe AKI (equivalent to KDIGO [Kidney Disease Improving Global Outcomes] stages 2–3) within 12 hours of enrollment. Biomarker performance was assessed using the area under the receiver operating characteristic curve, integrated discrimination improvement, and category-free net reclassification improvement. RESULTS A total of 375 patients were included in the final analysis of whom 35 (9%) developed moderate-severe AKI within 12 hours. The area under the receiver operating characteristic curve for [TIMP-2]•[IGFBP7] alone was 0.84 (95% confidence interval, 0.76–0.90; p < 0.0001). Biomarker performance was robust in sensitivity analysis across predefined subgroups (urgency and type of surgery). CONCLUSION For postoperative surgical intensive care unit patients, a single urinary TIMP2•IGFBP7 test accurately identified patients at risk for developing AKI within the ensuing 12 hours and its inclusion in clinical risk prediction models significantly enhances their performance. LEVEL OF EVIDENCE Prognostic study, level I. PMID:26816218

  5. Progress and challenges in predicting protein interfaces

    PubMed Central

    Krawczyk, Konrad; Knapp, Bernhard; Nebel, Jean-Christophe; Deane, Charlotte M.

    2016-01-01

    The majority of biological processes are mediated via protein–protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field. PMID:25971595

  6. Using support vector machine and evolutionary profiles to predict antifreeze protein sequences.

    PubMed

    Zhao, Xiaowei; Ma, Zhiqiang; Yin, Minghao

    2012-01-01

    Antifreeze proteins (AFPs) are ice-binding proteins. Accurate identification of new AFPs is important in understanding ice-protein interactions and creating novel ice-binding domains in other proteins. In this paper, an accurate method, called AFP_PSSM, has been developed for predicting antifreeze proteins using a support vector machine (SVM) and position specific scoring matrix (PSSM) profiles. This is the first study in which evolutionary information in the form of PSSM profiles has been successfully used for predicting antifreeze proteins. Tested by 10-fold cross validation and independent test, the accuracy of the proposed method reaches 82.67% for the training dataset and 93.01% for the testing dataset, respectively. These results indicate that our predictor is a useful tool for predicting antifreeze proteins. A web server (AFP_PSSM) that implements the proposed predictor is freely available.

  7. Comparative modeling: the state of the art and protein drug target structure prediction.

    PubMed

    Liu, Tianyun; Tang, Grace W; Capriotti, Emidio

    2011-07-01

    The goal of computational protein structure prediction is to provide three-dimensional (3D) structures with resolution comparable to experimental results. Comparative modeling, which predicts the 3D structure of a protein based on its sequence similarity to homologous structures, is the most accurate computational method for structure prediction. In the last two decades, significant progress has been made on comparative modeling methods. Using the large number of protein structures deposited in the Protein Data Bank (~65,000), automatic prediction pipelines are generating a tremendous number of models (~1.9 million) for sequences whose structures have not been experimentally determined. Accurate models are suitable for a wide range of applications, such as prediction of protein binding sites, prediction of the effect of protein mutations, and structure-guided virtual screening. In particular, comparative modeling has enabled structure-based drug design against protein targets with unknown structures. In this review, we describe the theoretical basis of comparative modeling, the available automatic methods and databases, and the algorithms to evaluate the accuracy of predicted structures. Finally, we discuss relevant applications in the prediction of important drug target proteins, focusing on the G protein-coupled receptor (GPCR) and protein kinase families.

  8. Predicting accurate fluorescent spectra for high molecular weight polycyclic aromatic hydrocarbons using density functional theory

    NASA Astrophysics Data System (ADS)

    Powell, Jacob; Heider, Emily C.; Campiglia, Andres; Harper, James K.

    2016-10-01

    The ability of density functional theory (DFT) methods to predict accurate fluorescence spectra for polycyclic aromatic hydrocarbons (PAHs) is explored. Two methods, PBE0 and CAM-B3LYP, are evaluated both in the gas phase and in solution. Spectra for several of the most toxic PAHs are predicted and compared to experiment, including three isomers of C24H14 and a PAH containing heteroatoms. Unusually high-resolution experimental spectra are obtained for comparison by analyzing each PAH at 4.2 K in an n-alkane matrix. All theoretical spectra visually conform to the profiles of the experimental data but are systematically offset by a small amount. Specifically, when solvent is included the PBE0 functional overestimates peaks by 16.1 ± 6.6 nm while CAM-B3LYP underestimates the same transitions by 14.5 ± 7.6 nm. These calculated spectra can be empirically corrected to decrease the uncertainties to 6.5 ± 5.1 and 5.7 ± 5.1 nm for the PBE0 and CAM-B3LYP methods, respectively. A comparison of computed spectra in the gas phase indicates that the inclusion of n-octane shifts peaks by +11 nm on average and this change is roughly equivalent for PBE0 and CAM-B3LYP. An automated approach for comparing spectra is also described that minimizes residuals between a given theoretical spectrum and all available experimental spectra. This approach identifies the correct spectrum in all cases and excludes approximately 80% of the incorrect spectra, demonstrating that an automated search of theoretical libraries of spectra may eventually become feasible.

  9. PETs: A Stable and Accurate Predictor of Protein-Protein Interacting Sites Based on Extremely-Randomized Trees.

    PubMed

    Xia, Bin; Zhang, Hong; Li, Qianmu; Li, Tao

    2015-12-01

    Protein-protein interaction (PPI) plays crucial roles in the performance of various biological processes. A variety of methods are dedicated to identify whether proteins have interaction residues, but it is often more crucial to recognize each amino acid. In practical applications, the stability of a prediction model is as important as its accuracy. However, random sampling, which is widely used in previous prediction models, often brings large difference between each training model. In this paper, a Predictor of protein-protein interaction sites based on Extremely-randomized Trees (PETs) is proposed to improve the prediction accuracy while maintaining the prediction stability. In PETs, a cluster-based sampling strategy is proposed to ensure the model stability: first, the training dataset is divided into subsets using specific features; second, the subsets are clustered using K-means; and finally the samples are selected from each cluster. Using the proposed sampling strategy, samples which have different types of significant features could be selected independently from different clusters. The evaluation shows that PETs is able to achieve better accuracy while maintaining a good stability. The source code and toolkit are available at https://github.com/BinXia/PETs.

  10. Effects of Protein Conformation in Docking: Improved Pose Prediction through Protein Pocket Adaptation

    PubMed Central

    Jain, Ajay N.

    2009-01-01

    Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the “cognate” docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the “cross-docking case” (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate. PMID:19340588

  11. How accurately can we predict the melting points of drug-like compounds?

    PubMed

    Tetko, Igor V; Sushko, Yurii; Novotarskyi, Sergii; Patiny, Luc; Kondratov, Ivan; Petrenko, Alexander E; Charochkina, Larisa; Asiri, Abdullah M

    2014-12-22

    This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638.

  12. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models.

    PubMed

    Blackman, Jonathan; Field, Scott E; Galley, Chad R; Szilágyi, Béla; Scheel, Mark A; Tiglio, Manuel; Hemberger, Daniel A

    2015-09-18

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic _{-2}Y_{ℓm} waveform modes resolved by the NR code up to ℓ=8. We compare our surrogate model to effective one body waveforms from 50M_{⊙} to 300M_{⊙} for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases).

  13. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models.

    PubMed

    Blackman, Jonathan; Field, Scott E; Galley, Chad R; Szilágyi, Béla; Scheel, Mark A; Tiglio, Manuel; Hemberger, Daniel A

    2015-09-18

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic _{-2}Y_{ℓm} waveform modes resolved by the NR code up to ℓ=8. We compare our surrogate model to effective one body waveforms from 50M_{⊙} to 300M_{⊙} for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases). PMID:26430979

  14. How accurately can we predict the melting points of drug-like compounds?

    PubMed

    Tetko, Igor V; Sushko, Yurii; Novotarskyi, Sergii; Patiny, Luc; Kondratov, Ivan; Petrenko, Alexander E; Charochkina, Larisa; Asiri, Abdullah M

    2014-12-22

    This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638. PMID:25489863

  15. A survey of factors contributing to accurate theoretical predictions of atomization energies and molecular structures

    NASA Astrophysics Data System (ADS)

    Feller, David; Peterson, Kirk A.; Dixon, David A.

    2008-11-01

    High level electronic structure predictions of thermochemical properties and molecular structure are capable of accuracy rivaling the very best experimental measurements as a result of rapid advances in hardware, software, and methodology. Despite the progress, real world limitations require practical approaches designed for handling general chemical systems that rely on composite strategies in which a single, intractable calculation is replaced by a series of smaller calculations. As typically implemented, these approaches produce a final, or "best," estimate that is constructed from one major component, fine-tuned by multiple corrections that are assumed to be additive. Though individually much smaller than the original, unmanageable computational problem, these corrections are nonetheless extremely costly. This study presents a survey of the widely varying magnitude of the most important components contributing to the atomization energies and structures of 106 small molecules. It combines large Gaussian basis sets and coupled cluster theory up to quadruple excitations for all systems. In selected cases, the effects of quintuple excitations and/or full configuration interaction were also considered. The availability of reliable experimental data for most of the molecules permits an expanded statistical analysis of the accuracy of the approach. In cases where reliable experimental information is currently unavailable, the present results are expected to provide some of the most accurate benchmark values available.

  16. Accurate prediction of band gaps and optical properties of HfO2

    NASA Astrophysics Data System (ADS)

    Ondračka, Pavel; Holec, David; Nečas, David; Zajíčková, Lenka

    2016-10-01

    We report on optical properties of various polymorphs of hafnia predicted within the framework of density functional theory. The full potential linearised augmented plane wave method was employed together with the Tran-Blaha modified Becke-Johnson potential (TB-mBJ) for exchange and local density approximation for correlation. Unit cells of monoclinic, cubic and tetragonal crystalline, and a simulated annealing-based model of amorphous hafnia were fully relaxed with respect to internal positions and lattice parameters. Electronic structures and band gaps for monoclinic, cubic, tetragonal and amorphous hafnia were calculated using three different TB-mBJ parametrisations and the results were critically compared with the available experimental and theoretical reports. Conceptual differences between a straightforward comparison of experimental measurements to a calculated band gap on the one hand and to a whole electronic structure (density of electronic states) on the other hand, were pointed out, suggesting the latter should be used whenever possible. Finally, dielectric functions were calculated at two levels, using the random phase approximation without local field effects and with a more accurate Bethe-Salpether equation (BSE) to account for excitonic effects. We conclude that a satisfactory agreement with experimental data for HfO2 was obtained only in the latter case.

  17. Accurate prediction of V1 location from cortical folds in a surface coordinate system

    PubMed Central

    Hinds, Oliver P.; Rajendran, Niranjini; Polimeni, Jonathan R.; Augustinack, Jean C.; Wiggins, Graham; Wald, Lawrence L.; Rosas, H. Diana; Potthast, Andreas; Schwartz, Eric L.; Fischl, Bruce

    2008-01-01

    Previous studies demonstrated substantial variability of the location of primary visual cortex (V1) in stereotaxic coordinates when linear volume-based registration is used to match volumetric image intensities (Amunts et al., 2000). However, other qualitative reports of V1 location (Smith, 1904; Stensaas et al., 1974; Rademacher et al., 1993) suggested a consistent relationship between V1 and the surrounding cortical folds. Here, the relationship between folds and the location of V1 is quantified using surface-based analysis to generate a probabilistic atlas of human V1. High-resolution (about 200 μm) magnetic resonance imaging (MRI) at 7 T of ex vivo human cerebral hemispheres allowed identification of the full area via the stria of Gennari: a myeloarchitectonic feature specific to V1. Separate, whole-brain scans were acquired using MRI at 1.5 T to allow segmentation and mesh reconstruction of the cortical gray matter. For each individual, V1 was manually identified in the high-resolution volume and projected onto the cortical surface. Surface-based intersubject registration (Fischl et al., 1999b) was performed to align the primary cortical folds of individual hemispheres to those of a reference template representing the average folding pattern. An atlas of V1 location was constructed by computing the probability of V1 inclusion for each cortical location in the template space. This probabilistic atlas of V1 exhibits low prediction error compared to previous V1 probabilistic atlases built in volumetric coordinates. The increased predictability observed under surface-based registration suggests that the location of V1 is more accurately predicted by the cortical folds than by the shape of the brain embedded in the volume of the skull. In addition, the high quality of this atlas provides direct evidence that surface-based intersubject registration methods are superior to volume-based methods at superimposing functional areas of cortex, and therefore are better

  18. Comparative motif discovery combined with comparative transcriptomics yields accurate targetome and enhancer predictions.

    PubMed

    Naval-Sánchez, Marina; Potier, Delphine; Haagen, Lotte; Sánchez, Máximo; Munck, Sebastian; Van de Sande, Bram; Casares, Fernando; Christiaens, Valerie; Aerts, Stein

    2013-01-01

    The identification of transcription factor binding sites, enhancers, and transcriptional target genes often relies on the integration of gene expression profiling and computational cis-regulatory sequence analysis. Methods for the prediction of cis-regulatory elements can take advantage of comparative genomics to increase signal-to-noise levels. However, gene expression data are usually derived from only one species. Here we investigate tissue-specific cross-species gene expression profiling by high-throughput sequencing, combined with cross-species motif discovery. First, we compared different methods for expression level quantification and cross-species integration using Tag-seq data. Using the optimal pipeline, we derived a set of genes with conserved expression during retinal determination across Drosophila melanogaster, Drosophila yakuba, and Drosophila virilis. These genes are enriched for binding sites of eye-related transcription factors including the zinc-finger Glass, a master regulator of photoreceptor differentiation. Validation of predicted Glass targets using RNA-seq in homozygous glass mutants confirms that the majority of our predictions are expressed downstream from Glass. Finally, we tested nine candidate enhancers by in vivo reporter assays and found eight of them to drive GFP in the eye disc, of which seven colocalize with the Glass protein, namely, scrt, chp, dpr10, CG6329, retn, Lim3, and dmrt99B. In conclusion, we show for the first time the combined use of cross-species expression profiling with cross-species motif discovery as a method to define a core developmental program, and we augment the candidate Glass targetome from a single known target gene, lozenge, to at least 62 conserved transcriptional targets. PMID:23070853

  19. Predicting protein-ligand and protein-peptide interfaces

    NASA Astrophysics Data System (ADS)

    Bertolazzi, Paola; Guerra, Concettina; Liuzzi, Giampaolo

    2014-06-01

    The paper deals with the identification of binding sites and concentrates on interactions involving small interfaces. In particular we focus our attention on two major interface types, namely protein-ligand and protein-peptide interfaces. As concerns protein-ligand binding site prediction, we classify the most interesting methods and approaches into four main categories: (a) shape-based methods, (b) alignment-based methods, (c) graph-theoretic approaches and (d) machine learning methods. Class (a) encompasses those methods which employ, in some way, geometric information about the protein surface. Methods falling into class (b) address the prediction problem as an alignment problem, i.e. finding protein-ligand atom pairs that occupy spatially equivalent positions. Graph theoretic approaches, class (c), are mainly based on the definition of a particular graph, known as the protein contact graph, and then apply some sophisticated methods from graph theory to discover subgraphs or score similarities for uncovering functional sites. The last class (d) contains those methods that are based on the learn-from-examples paradigm and that are able to take advantage of the large amount of data available on known protein-ligand pairs. As for protein-peptide interfaces, due to the often disordered nature of the regions involved in binding, shape similarity is no longer a determining factor. Then, in geometry-based methods, geometry is accounted for by providing the relative position of the atoms surrounding the peptide residues in known structures. Finally, also for protein-peptide interfaces, we present a classification of some successful machine learning methods. Indeed, they can be categorized in the way adopted to construct the learning examples. In particular, we envisage three main methods: distance functions, structure and potentials and structure alignment.

  20. Predicting Resistance Mutations Using Protein Design Algorithms

    SciTech Connect

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  1. Protein Structure Prediction with Evolutionary Algorithms

    SciTech Connect

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  2. Consistent probabilistic outputs for protein function prediction

    PubMed Central

    Obozinski, Guillaume; Lanckriet, Gert; Grant, Charles; Jordan, Michael I; Noble, William Stafford

    2008-01-01

    In predicting hierarchical protein function annotations, such as terms in the Gene Ontology (GO), the simplest approach makes predictions for each term independently. However, this approach has the unfortunate consequence that the predictor may assign to a single protein a set of terms that are inconsistent with one another; for example, the predictor may assign a specific GO term to a given protein ('purine nucleotide binding') but not assign the parent term ('nucleotide binding'). Such predictions are difficult to interpret. In this work, we focus on methods for calibrating and combining independent predictions to obtain a set of probabilistic predictions that are consistent with the topology of the ontology. We call this procedure 'reconciliation'. We begin with a baseline method for predicting GO terms from a collection of data types using an ensemble of discriminative classifiers. We apply the method to a previously described benchmark data set, and we demonstrate that the resulting predictions are frequently inconsistent with the topology of the GO. We then consider 11 distinct reconciliation methods: three heuristic methods; four variants of a Bayesian network; an extension of logistic regression to the structured case; and three novel projection methods - isotonic regression and two variants of a Kullback-Leibler projection method. We evaluate each method in three different modes - per term, per protein and joint - corresponding to three types of prediction tasks. Although the principal goal of reconciliation is interpretability, it is important to assess whether interpretability comes at a cost in terms of precision and recall. Indeed, we find that many apparently reasonable reconciliation methods yield reconciled probabilities with significantly lower precision than the original, unreconciled estimates. On the other hand, we find that isotonic regression usually performs better than the underlying, unreconciled method, and almost never performs worse

  3. Combining physicochemical and evolutionary information for protein contact prediction.

    PubMed

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

  4. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.

    PubMed

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.

  5. Predicting Protein Function Using Multiple Kernels.

    PubMed

    Yu, Guoxian; Rangwala, Huzefa; Domeniconi, Carlotta; Zhang, Guoji; Zhang, Zili

    2015-01-01

    High-throughput experimental techniques provide a wide variety of heterogeneous proteomic data sources. To exploit the information spread across multiple sources for protein function prediction, these data sources are transformed into kernels and then integrated into a composite kernel. Several methods first optimize the weights on these kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these approaches result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some approaches optimize the loss of binary classifiers and learn weights for the different kernels iteratively. For multi-class or multi-label data, these methods have to solve the problem of optimizing weights on these kernels for each of the labels, which are computationally expensive and ignore the correlation among labels. In this paper, we propose a method called Predicting Protein Function using Multiple Kernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously. ProMK can integrate kernels selectively and downgrade the weights on noisy kernels. We investigate the performance of ProMK on several publicly available protein function prediction benchmarks and synthetic datasets. We show that the proposed approach performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi-label multiple kernel learning methods. The codes of our proposed method are available at https://sites.google.com/site/guoxian85/promk.

  6. Unilateral Prostate Cancer Cannot be Accurately Predicted in Low-Risk Patients

    SciTech Connect

    Isbarn, Hendrik; Karakiewicz, Pierre I.; Vogel, Susanne

    2010-07-01

    Purpose: Hemiablative therapy (HAT) is increasing in popularity for treatment of patients with low-risk prostate cancer (PCa). The validity of this therapeutic modality, which exclusively treats PCa within a single prostate lobe, rests on accurate staging. We tested the accuracy of unilaterally unremarkable biopsy findings in cases of low-risk PCa patients who are potential candidates for HAT. Methods and Materials: The study population consisted of 243 men with clinical stage {<=}T2a, a prostate-specific antigen (PSA) concentration of <10 ng/ml, a biopsy-proven Gleason sum of {<=}6, and a maximum of 2 ipsilateral positive biopsy results out of 10 or more cores. All men underwent a radical prostatectomy, and pathology stage was used as the gold standard. Univariable and multivariable logistic regression models were tested for significant predictors of unilateral, organ-confined PCa. These predictors consisted of PSA, %fPSA (defined as the quotient of free [uncomplexed] PSA divided by the total PSA), clinical stage (T2a vs. T1c), gland volume, and number of positive biopsy cores (2 vs. 1). Results: Despite unilateral stage at biopsy, bilateral or even non-organ-confined PCa was reported in 64% of all patients. In multivariable analyses, no variable could clearly and independently predict the presence of unilateral PCa. This was reflected in an overall accuracy of 58% (95% confidence interval, 50.6-65.8%). Conclusions: Two-thirds of patients with unilateral low-risk PCa, confirmed by clinical stage and biopsy findings, have bilateral or non-organ-confined PCa at radical prostatectomy. This alarming finding questions the safety and validity of HAT.

  7. CASP11--An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline.

    PubMed

    Fischer, Axel W; Heinze, Sten; Putnam, Daniel K; Li, Bian; Pino, James C; Xia, Yan; Lopez, Carlos F; Meiler, Jens

    2016-01-01

    In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data.

  8. Improving DOE-2's RESYS routine: User defined functions to provide more accurate part load energy use and humidity predictions

    SciTech Connect

    Henderson, Hugh I.; Parker, Danny; Huang, Yu J.

    2000-08-04

    In hourly energy simulations, it is important to properly predict the performance of air conditioning systems over a range of full and part load operating conditions. An important component of these calculations is to properly consider the performance of the cycling air conditioner and how it interacts with the building. This paper presents improved approaches to properly account for the part load performance of residential and light commercial air conditioning systems in DOE-2. First, more accurate correlations are given to predict the degradation of system efficiency at part load conditions. In addition, a user-defined function for RESYS is developed that provides improved predictions of air conditioner sensible and latent capacity at part load conditions. The user function also provides more accurate predictions of space humidity by adding ''lumped'' moisture capacitance into the calculations. The improved cooling coil model and the addition of moisture capacitance predicts humidity swings that are more representative of the performance observed in real buildings.

  9. High IFIT1 expression predicts improved clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma.

    PubMed

    Zhang, Jin-Feng; Chen, Yao; Lin, Guo-Shi; Zhang, Jian-Dong; Tang, Wen-Long; Huang, Jian-Huang; Chen, Jin-Shou; Wang, Xing-Fu; Lin, Zhi-Xiong

    2016-06-01

    Interferon-induced protein with tetratricopeptide repeat 1 (IFIT1) plays a key role in growth suppression and apoptosis promotion in cancer cells. Interferon was reported to induce the expression of IFIT1 and inhibit the expression of O-6-methylguanine-DNA methyltransferase (MGMT).This study aimed to investigate the expression of IFIT1, the correlation between IFIT1 and MGMT, and their impact on the clinical outcome in newly diagnosed glioblastoma. The expression of IFIT1 and MGMT and their correlation were investigated in the tumor tissues from 70 patients with newly diagnosed glioblastoma. The effects on progression-free survival and overall survival were evaluated. Of 70 cases, 57 (81.4%) tissue samples showed high expression of IFIT1 by immunostaining. The χ(2) test indicated that the expression of IFIT1 and MGMT was negatively correlated (r = -0.288, P = .016). Univariate and multivariate analyses confirmed high IFIT1 expression as a favorable prognostic indicator for progression-free survival (P = .005 and .017) and overall survival (P = .001 and .001), respectively. Patients with 2 favorable factors (high IFIT1 and low MGMT) had an improved prognosis as compared with others. The results demonstrated significantly increased expression of IFIT1 in newly diagnosed glioblastoma tissue. The negative correlation between IFIT1 and MGMT expression may be triggered by interferon. High IFIT1 can be a predictive biomarker of favorable clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma. PMID:26980050

  10. Protein Markers Predict Survival in Glioma Patients.

    PubMed

    Stetson, Lindsay C; Dazard, Jean-Eudes; Barnholtz-Sloan, Jill S

    2016-07-01

    Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12-14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients, we have identified a set of 13 proteins with prognostic significance. Our protein signature predictive of glioblastoma (PROTGLIO) patient survival model was constructed and validated on independent data sets and was shown to significantly predict survival in GBM patients (log-rank test: p = 0.0009). Using a multivariate Cox proportional hazards, we have shown that our PROTGLIO model is distinct from other known GBM prognostic factors (age at diagnosis, extent of surgical resection, postoperative Karnofsky performance score (KPS), treatment with temozolomide (TMZ) chemoradiation, and methylation of the MGMT gene). Tenfold cross-validation repetition of our model generation procedure confirmed validation of PROTGLIO. The model was further validated on an independent set of isocitrate dehydrogenase wild-type (IDHwt) lower grade gliomas (LGG)-a portion of these tumors progress rapidly to GBM. The PROTGLIO model contains proteins, such as Cox-2 and Annexin 1, involved in inflammatory response, pointing to potential therapeutic interventions. The PROTGLIO model is a simple and effective predictor of overall survival in glioblastoma patients, making it potentially useful in clinical practice of glioblastoma multiforme. PMID:27143410

  11. Predicting protein functions from PPI networks using functional aggregation.

    PubMed

    Hou, Jingyu; Chi, Xiaoxiao

    2012-11-01

    Predicting protein functions computationally from massive protein-protein interaction (PPI) data generated by high-throughput technology is one of the challenges and fundamental problems in the post-genomic era. Although there have been many approaches developed for computationally predicting protein functions, the mutual correlations among proteins in terms of protein functions have not been thoroughly investigated and incorporated into existing prediction methods, especially in voting based prediction methods. In this paper, we propose an innovative method to predict protein functions from PPI data by aggregating the functional correlations among relevant proteins using the Choquet-Integral in fuzzy theory. This functional aggregation measures the real impact of each relevant protein function on the final prediction results, and reduces the impact of repeated functional information on the prediction. Accordingly, a new protein similarity and a new iterative prediction algorithm are proposed in this paper. The experimental evaluations on real PPI datasets demonstrate the effectiveness of our method.

  12. Transferring network topological knowledge for predicting protein-protein interactions.

    PubMed

    Xu, Qian; Xiang, Evan Wei; Yang, Qiang

    2011-10-01

    Protein-protein interactions (PPIs) play an important role in cellular processes within a cell. An important task is to determine the existence of interactions among proteins. Unfortunately, the existing biological experimental techniques are expensive, time-consuming and labor-intensive. The network structures of many such networks are sparse, incomplete and noisy. Thus, state-of-the-art methods for link prediction in these networks often cannot give satisfactory prediction results, especially when some networks are extremely sparse. Noticing that we typically have more than one PPI network available, we naturally wonder whether it is possible to 'transfer' the linkage knowledge from some existing, relatively dense networks to a sparse network, to improve the prediction performance. Noticing that a network structure can be modeled using a matrix model, we introduce the well-known collective matrix factorization technique to 'transfer' usable linkage knowledge from relatively dense interaction network to a sparse target network. Our approach is to establish a correspondence between a source network and a target network via network-wide similarities. We test this method on two real PPI networks, Helicobacter pylori (as a target network) and human (as a source network). Our experimental results show that our method can achieve higher performance as compared with some baseline methods. PMID:21770035

  13. Predicting protein-protein interactions in the post synaptic density.

    PubMed

    Bar-shira, Ossnat; Chechik, Gal

    2013-09-01

    The post synaptic density (PSD) is a specialization of the cytoskeleton at the synaptic junction, composed of hundreds of different proteins. Characterizing the protein components of the PSD and their interactions can help elucidate the mechanism of long-term changes in synaptic plasticity, which underlie learning and memory. Unfortunately, our knowledge of the proteome and interactome of the PSD is still partial and noisy. In this study we describe a computational framework to improve the reconstruction of the PSD network. The approach is based on learning the characteristics of PSD protein interactions from a set of trusted interactions, expanding this set with data collected from large scale repositories, and then predicting novel interaction with proteins that are suspected to reside in the PSD. Using this method we obtained thirty predicted interactions, with more than half of which having supporting evidence in the literature. We discuss in details two of these new interactions, Lrrtm1 with PSD-95 and Src with Capg. The first may take part in a mechanism underlying glutamatergic dysfunction in schizophrenia. The second suggests an alternative mechanism to regulate dendritic spines maturation.

  14. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder.

    PubMed

    Peng, Zhenling; Kurgan, Lukasz

    2015-10-15

    Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein-protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/.

  15. Rapid calculation of accurate atomic charges for proteins via the electronegativity equalization method.

    PubMed

    Ionescu, Crina-Maria; Geidl, Stanislav; Svobodová Vařeková, Radka; Koča, Jaroslav

    2013-10-28

    We focused on the parametrization and evaluation of empirical models for fast and accurate calculation of conformationally dependent atomic charges in proteins. The models were based on the electronegativity equalization method (EEM), and the parametrization procedure was tailored to proteins. We used large protein fragments as reference structures and fitted the EEM model parameters using atomic charges computed by three population analyses (Mulliken, Natural, iterative Hirshfeld), at the Hartree-Fock level with two basis sets (6-31G*, 6-31G**) and in two environments (gas phase, implicit solvation). We parametrized and successfully validated 24 EEM models. When tested on insulin and ubiquitin, all models reproduced quantum mechanics level charges well and were consistent with respect to population analysis and basis set. Specifically, the models showed on average a correlation of 0.961, RMSD 0.097 e, and average absolute error per atom 0.072 e. The EEM models can be used with the freely available EEM implementation EEM_SOLVER.

  16. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra

    PubMed Central

    Xiao, Kaijie; Yu, Fan; Fang, Houqin; Xue, Bingbing; Liu, Yan; Tian, Zhixin

    2015-01-01

    It has long been an analytical challenge to accurately and efficiently resolve extremely dense overlapping isotopic envelopes (OIEs) in protein tandem mass spectra to confidently identify proteins. Here, we report a computationally efficient method, called OIE_CARE, to resolve OIEs by calculating the relative deviation between the ideal and observed experimental abundance. In the OIE_CARE method, the ideal experimental abundance of a particular overlapping isotopic peak (OIP) is first calculated for all the OIEs sharing this OIP. The relative deviation (RD) of the overall observed experimental abundance of this OIP relative to the summed ideal value is then calculated. The final individual abundance of the OIP for each OIE is the individual ideal experimental abundance multiplied by 1 + RD. Initial studies were performed using higher-energy collisional dissociation tandem mass spectra on myoglobin (with direct infusion) and the intact E. coli proteome (with liquid chromatographic separation). Comprehensive data at the protein and proteome levels, high confidence and good reproducibility were achieved. The resolving method reported here can, in principle, be extended to resolve any envelope-type overlapping data for which the corresponding theoretical reference values are available. PMID:26439836

  17. Mitotic Protein CSPP1 Interacts with CENP-H Protein to Coordinate Accurate Chromosome Oscillation in Mitosis.

    PubMed

    Zhu, Lijuan; Wang, Zhikai; Wang, Wenwen; Wang, Chunli; Hua, Shasha; Su, Zeqi; Brako, Larry; Garcia-Barrio, Minerva; Ye, Mingliang; Wei, Xuan; Zou, Hanfa; Ding, Xia; Liu, Lifang; Liu, Xing; Yao, Xuebiao

    2015-11-01

    Mitotic chromosome segregation is orchestrated by the dynamic interaction of spindle microtubules with the kinetochores. During chromosome alignment, kinetochore-bound microtubules undergo dynamic cycles between growth and shrinkage, leading to an oscillatory movement of chromosomes along the spindle axis. Although kinetochore protein CENP-H serves as a molecular control of kinetochore-microtubule dynamics, the mechanistic link between CENP-H and kinetochore microtubules (kMT) has remained less characterized. Here, we show that CSPP1 is a kinetochore protein essential for accurate chromosome movements in mitosis. CSPP1 binds to CENP-H in vitro and in vivo. Suppression of CSPP1 perturbs proper mitotic progression and compromises the satisfaction of spindle assembly checkpoint. In addition, chromosome oscillation is greatly attenuated in CSPP1-depleted cells, similar to what was observed in the CENP-H-depleted cells. Importantly, CSPP1 depletion enhances velocity of kinetochore movement, and overexpression of CSPP1 decreases the speed, suggesting that CSPP1 promotes kMT stability during cell division. Specific perturbation of CENP-H/CSPP1 interaction using a membrane-permeable competing peptide resulted in a transient mitotic arrest and chromosome segregation defect. Based on these findings, we propose that CSPP1 cooperates with CENP-H on kinetochores to serve as a novel regulator of kMT dynamics for accurate chromosome segregation.

  18. Electrostatics of proteins in dielectric solvent continua. I. An accurate and efficient reaction field description

    SciTech Connect

    Bauer, Sebastian; Mathias, Gerald; Tavan, Paul

    2014-03-14

    We present a reaction field (RF) method which accurately solves the Poisson equation for proteins embedded in dielectric solvent continua at a computational effort comparable to that of an electrostatics calculation with polarizable molecular mechanics (MM) force fields. The method combines an approach originally suggested by Egwolf and Tavan [J. Chem. Phys. 118, 2039 (2003)] with concepts generalizing the Born solution [Z. Phys. 1, 45 (1920)] for a solvated ion. First, we derive an exact representation according to which the sources of the RF potential and energy are inducible atomic anti-polarization densities and atomic shielding charge distributions. Modeling these atomic densities by Gaussians leads to an approximate representation. Here, the strengths of the Gaussian shielding charge distributions are directly given in terms of the static partial charges as defined, e.g., by standard MM force fields for the various atom types, whereas the strengths of the Gaussian anti-polarization densities are calculated by a self-consistency iteration. The atomic volumes are also described by Gaussians. To account for covalently overlapping atoms, their effective volumes are calculated by another self-consistency procedure, which guarantees that the dielectric function ε(r) is close to one everywhere inside the protein. The Gaussian widths σ{sub i} of the atoms i are parameters of the RF approximation. The remarkable accuracy of the method is demonstrated by comparison with Kirkwood's analytical solution for a spherical protein [J. Chem. Phys. 2, 351 (1934)] and with computationally expensive grid-based numerical solutions for simple model systems in dielectric continua including a di-peptide (Ac-Ala-NHMe) as modeled by a standard MM force field. The latter example shows how weakly the RF conformational free energy landscape depends on the parameters σ{sub i}. A summarizing discussion highlights the achievements of the new theory and of its approximate solution

  19. Electrostatics of proteins in dielectric solvent continua. I. An accurate and efficient reaction field description.

    PubMed

    Bauer, Sebastian; Mathias, Gerald; Tavan, Paul

    2014-03-14

    We present a reaction field (RF) method which accurately solves the Poisson equation for proteins embedded in dielectric solvent continua at a computational effort comparable to that of an electrostatics calculation with polarizable molecular mechanics (MM) force fields. The method combines an approach originally suggested by Egwolf and Tavan [J. Chem. Phys. 118, 2039 (2003)] with concepts generalizing the Born solution [Z. Phys. 1, 45 (1920)] for a solvated ion. First, we derive an exact representation according to which the sources of the RF potential and energy are inducible atomic anti-polarization densities and atomic shielding charge distributions. Modeling these atomic densities by Gaussians leads to an approximate representation. Here, the strengths of the Gaussian shielding charge distributions are directly given in terms of the static partial charges as defined, e.g., by standard MM force fields for the various atom types, whereas the strengths of the Gaussian anti-polarization densities are calculated by a self-consistency iteration. The atomic volumes are also described by Gaussians. To account for covalently overlapping atoms, their effective volumes are calculated by another self-consistency procedure, which guarantees that the dielectric function ε(r) is close to one everywhere inside the protein. The Gaussian widths σ(i) of the atoms i are parameters of the RF approximation. The remarkable accuracy of the method is demonstrated by comparison with Kirkwood's analytical solution for a spherical protein [J. Chem. Phys. 2, 351 (1934)] and with computationally expensive grid-based numerical solutions for simple model systems in dielectric continua including a di-peptide (Ac-Ala-NHMe) as modeled by a standard MM force field. The latter example shows how weakly the RF conformational free energy landscape depends on the parameters σ(i). A summarizing discussion highlights the achievements of the new theory and of its approximate solution particularly by

  20. Accurate prediction model of bead geometry in crimping butt of the laser brazing using generalized regression neural network

    NASA Astrophysics Data System (ADS)

    Rong, Y. M.; Chang, Y.; Huang, Y.; Zhang, G. J.; Shao, X. Y.

    2015-12-01

    There are few researches that concentrate on the prediction of the bead geometry for laser brazing with crimping butt. This paper addressed the accurate prediction of the bead profile by developing a generalized regression neural network (GRNN) algorithm. Firstly GRNN model was developed and trained to decrease the prediction error that may be influenced by the sample size. Then the prediction accuracy was demonstrated by comparing with other articles and back propagation artificial neural network (BPNN) algorithm. Eventually the reliability and stability of GRNN model were discussed from the points of average relative error (ARE), mean square error (MSE) and root mean square error (RMSE), while the maximum ARE and MSE were 6.94% and 0.0303 that were clearly less than those (14.28% and 0.0832) predicted by BPNN. Obviously, it was proved that the prediction accuracy was improved at least 2 times, and the stability was also increased much more.

  1. Towards more accurate wind and solar power prediction by improving NWP model physics

    NASA Astrophysics Data System (ADS)

    Steiner, Andrea; Köhler, Carmen; von Schumann, Jonas; Ritter, Bodo

    2014-05-01

    The growing importance and successive expansion of renewable energies raise new challenges for decision makers, economists, transmission system operators, scientists and many more. In this interdisciplinary field, the role of Numerical Weather Prediction (NWP) is to reduce the errors and provide an a priori estimate of remaining uncertainties associated with the large share of weather-dependent power sources. For this purpose it is essential to optimize NWP model forecasts with respect to those prognostic variables which are relevant for wind and solar power plants. An improved weather forecast serves as the basis for a sophisticated power forecasts. Consequently, a well-timed energy trading on the stock market, and electrical grid stability can be maintained. The German Weather Service (DWD) currently is involved with two projects concerning research in the field of renewable energy, namely ORKA*) and EWeLiNE**). Whereas the latter is in collaboration with the Fraunhofer Institute (IWES), the project ORKA is led by energy & meteo systems (emsys). Both cooperate with German transmission system operators. The goal of the projects is to improve wind and photovoltaic (PV) power forecasts by combining optimized NWP and enhanced power forecast models. In this context, the German Weather Service aims to improve its model system, including the ensemble forecasting system, by working on data assimilation, model physics and statistical post processing. This presentation is focused on the identification of critical weather situations and the associated errors in the German regional NWP model COSMO-DE. First steps leading to improved physical parameterization schemes within the NWP-model are presented. Wind mast measurements reaching up to 200 m height above ground are used for the estimation of the (NWP) wind forecast error at heights relevant for wind energy plants. One particular problem is the daily cycle in wind speed. The transition from stable stratification during

  2. FAMSA: Fast and accurate multiple sequence alignment of huge protein families

    PubMed Central

    Deorowicz, Sebastian; Debudaj-Grabysz, Agnieszka; Gudyś, Adam

    2016-01-01

    Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa. PMID:27670777

  3. Prediction of Peptide and Protein Propensity for Amyloid Formation.

    PubMed

    Família, Carlos; Dennison, Sarah R; Quintas, Alexandre; Phoenix, David A

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation. PMID:26241652

  4. Prediction of Peptide and Protein Propensity for Amyloid Formation.

    PubMed

    Família, Carlos; Dennison, Sarah R; Quintas, Alexandre; Phoenix, David A

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation.

  5. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. PMID:26718864

  6. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function.

  7. Bayesian Markov Random Field analysis for protein function prediction based on network data.

    PubMed

    Kourmpetis, Yiannis A I; van Dijk, Aalt D J; Bink, Marco C A M; van Ham, Roeland C H J; ter Braak, Cajo J F

    2010-02-24

    Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S. cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.

  8. A simple accurate method to predict time of ponding under variable intensity rainfall

    NASA Astrophysics Data System (ADS)

    Assouline, S.; Selker, J. S.; Parlange, J.-Y.

    2007-03-01

    The prediction of the time to ponding following commencement of rainfall is fundamental to hydrologic prediction of flood, erosion, and infiltration. Most of the studies to date have focused on prediction of ponding resulting from simple rainfall patterns. This approach was suitable to rainfall reported as average values over intervals of up to a day but does not take advantage of knowledge of the complex patterns of actual rainfall now commonly recorded electronically. A straightforward approach to include the instantaneous rainfall record in the prediction of ponding time and excess rainfall using only the infiltration capacity curve is presented. This method is tested against a numerical solution of the Richards equation on the basis of an actual rainfall record. The predicted time to ponding showed mean error ≤7% for a broad range of soils, with and without surface sealing. In contrast, the standard predictions had average errors of 87%, and worst-case errors exceeding a factor of 10. In addition to errors intrinsic in the modeling framework itself, errors that arise from averaging actual rainfall records over reporting intervals were evaluated. Averaging actual rainfall records observed in Israel over periods of as little as 5 min significantly reduced predicted runoff (75% for the sealed sandy loam and 46% for the silty clay loam), while hourly averaging gave complete lack of prediction of ponding in some of the cases.

  9. A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

    NASA Astrophysics Data System (ADS)

    Carlson, Joel N. K.; Park, Jong Min; Park, So-Yeon; In Park, Jong; Choi, Yunseok; Ye, Sung-Joon

    2016-03-01

    Discrepancies between planned and delivered movements of multi-leaf collimators (MLCs) are an important source of errors in dose distributions during radiotherapy. In this work we used machine learning techniques to train models to predict these discrepancies, assessed the accuracy of the model predictions, and examined the impact these errors have on quality assurance (QA) procedures and dosimetry. Predictive leaf motion parameters for the models were calculated from the plan files, such as leaf position and velocity, whether the leaf was moving towards or away from the isocenter of the MLC, and many others. Differences in positions between synchronized DICOM-RT planning files and DynaLog files reported during QA delivery were used as a target response for training of the models. The final model is capable of predicting MLC positions during delivery to a high degree of accuracy. For moving MLC leaves, predicted positions were shown to be significantly closer to delivered positions than were planned positions. By incorporating predicted positions into dose calculations in the TPS, increases were shown in gamma passing rates against measured dose distributions recorded during QA delivery. For instance, head and neck plans with 1%/2 mm gamma criteria had an average increase in passing rate of 4.17% (SD  =  1.54%). This indicates that the inclusion of predictions during dose calculation leads to a more realistic representation of plan delivery. To assess impact on the patient, dose volumetric histograms (DVH) using delivered positions were calculated for comparison with planned and predicted DVHs. In all cases, predicted dose volumetric parameters were in closer agreement to the delivered parameters than were the planned parameters, particularly for organs at risk on the periphery of the treatment area. By incorporating the predicted positions into the TPS, the treatment planner is given a more realistic view of the dose distribution as it will truly be

  10. Modeling and fitting protein-protein complexes to predict change of binding energy

    PubMed Central

    Dourado, Daniel F.A.R.; Flores, Samuel Coulbourn

    2016-01-01

    It is possible to accurately and economically predict change in protein-protein interaction energy upon mutation (ΔΔG), when a high-resolution structure of the complex is available. This is of growing usefulness for design of high-affinity or otherwise modified binding proteins for therapeutic, diagnostic, industrial, and basic science applications. Recently the field has begun to pursue ΔΔG prediction for homology modeled complexes, but so far this has worked mostly for cases of high sequence identity. If the interacting proteins have been crystallized in free (uncomplexed) form, in a majority of cases it is possible to find a structurally similar complex which can be used as the basis for template-based modeling. We describe how to use MMB to create such models, and then use them to predict ΔΔG, using a dataset consisting of free target structures, co-crystallized template complexes with sequence identify with respect to the targets as low as 44%, and experimental ΔΔG measurements. We obtain similar results by fitting to a low-resolution Cryo-EM density map. Results suggest that other structural constraints may lead to a similar outcome, making the method even more broadly applicable. PMID:27173910

  11. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation.

    PubMed

    Huang, Qiaoying; You, Zhuhong; Zhang, Xiaofeng; Zhou, Yong

    2015-01-01

    With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein-protein interactions (PPIs) research is becoming more and more important. Life activities and the protein-protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction. PMID:25984606

  12. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    SciTech Connect

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  13. Structure prediction of magnetosome-associated proteins.

    PubMed

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region - the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models' overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs.

  14. Empirical approaches to more accurately predict benthic-pelagic coupling in biogeochemical ocean models

    NASA Astrophysics Data System (ADS)

    Dale, Andy; Stolpovsky, Konstantin; Wallmann, Klaus

    2016-04-01

    The recycling and burial of biogenic material in the sea floor plays a key role in the regulation of ocean chemistry. Proper consideration of these processes in ocean biogeochemical models is becoming increasingly recognized as an important step in model validation and prediction. However, the rate of organic matter remineralization in sediments and the benthic flux of redox-sensitive elements are difficult to predict a priori. In this communication, examples of empirical benthic flux models that can be coupled to earth system models to predict sediment-water exchange in the open ocean are presented. Large uncertainties hindering further progress in this field include knowledge of the reactivity of organic carbon reaching the sediment, the importance of episodic variability in bottom water chemistry and particle rain rates (for both the deep-sea and margins) and the role of benthic fauna. How do we meet the challenge?

  15. An endometrial gene expression signature accurately predicts recurrent implantation failure after IVF

    PubMed Central

    Koot, Yvonne E. M.; van Hooff, Sander R.; Boomsma, Carolien M.; van Leenen, Dik; Groot Koerkamp, Marian J. A.; Goddijn, Mariëtte; Eijkemans, Marinus J. C.; Fauser, Bart C. J. M.; Holstege, Frank C. P.; Macklon, Nick S.

    2016-01-01

    The primary limiting factor for effective IVF treatment is successful embryo implantation. Recurrent implantation failure (RIF) is a condition whereby couples fail to achieve pregnancy despite consecutive embryo transfers. Here we describe the collection of gene expression profiles from mid-luteal phase endometrial biopsies (n = 115) from women experiencing RIF and healthy controls. Using a signature discovery set (n = 81) we identify a signature containing 303 genes predictive of RIF. Independent validation in 34 samples shows that the gene signature predicts RIF with 100% positive predictive value (PPV). The strength of the RIF associated expression signature also stratifies RIF patients into distinct groups with different subsequent implantation success rates. Exploration of the expression changes suggests that RIF is primarily associated with reduced cellular proliferation. The gene signature will be of value in counselling and guiding further treatment of women who fail to conceive upon IVF and suggests new avenues for developing intervention. PMID:26797113

  16. Change in body mass accurately and reliably predicts change in body water after endurance exercise.

    PubMed

    Baker, Lindsay B; Lang, James A; Kenney, W Larry

    2009-04-01

    This study tested the hypothesis that the change in body mass (DeltaBM) accurately reflects the change in total body water (DeltaTBW) after prolonged exercise. Subjects (4 men, 4 women; 22-36 year; 66 +/- 10 kg) completed 2 h of interval running (70% VO(2max)) in the heat (30 degrees C), followed by a run to exhaustion (85% VO(2max)), and then sat for a 1 h recovery period. During exercise and recovery, subjects drank fluid or no fluid to maintain their BM, increase BM by 2%, or decrease BM by 2 or 4% in separate trials. Pre- and post-experiment TBW were determined using the deuterium oxide (D(2)O) dilution technique and corrected for D(2)O lost in urine, sweat, breath vapor, and nonaqueous hydrogen exchange. The average difference between DeltaBM and DeltaTBW was 0.07 +/- 1.07 kg (paired t test, P = 0.29). The slope and intercept of the relation between DeltaBM and DeltaTBW were not significantly different from 1 and 0, respectively. The intraclass correlation coefficient between DeltaBM and DeltaTBW was 0.76, which is indicative of excellent reliability between methods. Measuring pre- to post-exercise DeltaBM is an accurate and reliable method to assess the DeltaTBW.

  17. Accurate prediction of kidney allograft outcome based on creatinine course in the first 6 months posttransplant.

    PubMed

    Fritsche, L; Hoerstrup, J; Budde, K; Reinke, P; Neumayer, H-H; Frei, U; Schlaefer, A

    2005-03-01

    Most attempts to predict early kidney allograft loss are based on the patient and donor characteristics at baseline. We investigated how the early posttransplant creatinine course compares to baseline information in the prediction of kidney graft failure within the first 4 years after transplantation. Two approaches to create a prediction rule for early graft failure were evaluated. First, the whole data set was analysed using a decision-tree building software. The software, rpart, builds classification or regression models; the resulting models can be represented as binary trees. In the second approach, a Hill-Climbing algorithm was applied to define cut-off values for the median creatinine level and creatinine slope in the period between day 60 and 180 after transplantation. Of the 497 patients available for analysis, 52 (10.5%) experienced an early graft loss (graft loss within the first 4 years after transplantation). From the rpart algorithm, a single decision criterion emerged: Median creatinine value on days 60 to 180 higher than 3.1 mg/dL predicts early graft failure (accuracy 95.2% but sensitivity = 42.3%). In contrast, the Hill-Climbing algorithm delivered a cut-off of 1.8 mg/dL for the median creatinine level and a cut-off of 0.3 mg/dL per month for the creatinine slope (sensitivity = 69.5% and specificity 79.0%). Prediction rules based on median and slope of creatinine levels in the first half year after transplantation allow early identification of patients who are at risk of loosing their graft early after transplantation. These patients may benefit from therapeutic measures tailored for this high-risk setting. PMID:15848516

  18. Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

    PubMed Central

    Luo, Longqiang; Li, Dingfang; Zhang, Wen; Tu, Shikui; Zhu, Xiaopeng; Tian, Gang

    2016-01-01

    Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. PMID:27074043

  19. Structure-Based Prediction of Unstable Regions in Proteins: Applications to Protein Misfolding Diseases

    NASA Astrophysics Data System (ADS)

    Guest, Will; Cashman, Neil; Plotkin, Steven

    2009-03-01

    Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and solution of the Poisson-Boltzmann equation. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.

  20. Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound-Kinase Activities: A Way toward Selective Promiscuity by Design?

    PubMed

    Christmann-Franck, Serge; van Westen, Gerard J P; Papadatos, George; Beltran Escudie, Fanny; Roberts, Alexander; Overington, John P; Domine, Daniel

    2016-09-26

    Drug discovery programs frequently target members of the human kinome and try to identify small molecule protein kinase inhibitors, primarily for cancer treatment, additional indications being increasingly investigated. One of the challenges is controlling the inhibitors degree of selectivity, assessed by in vitro profiling against panels of protein kinases. We manually extracted, compiled, and standardized such profiles published in the literature: we collected 356 908 data points corresponding to 482 protein kinases, 2106 inhibitors, and 661 patents. We then analyzed this data set in terms of kinome coverage, results reproducibility, popularity, and degree of selectivity of both kinases and inhibitors. We used the data set to create robust proteochemometric models capable of predicting kinase activity (the ligand-target space was modeled with an externally validated RMSE of 0.41 ± 0.02 log units and R02 0.74 ± 0.03), in order to account for missing or unreliable measurements. The influence on the prediction quality of parameters such as number of measurements, Murcko scaffold frequency or inhibitor type was assessed. Interpretation of the models enabled to highlight inhibitors and kinases properties correlated with higher affinities, and an analysis in the context of kinases crystal structures was performed. Overall, the models quality allows the accurate prediction of kinase-inhibitor activities and their structural interpretation, thus paving the way for the rational design of compounds with a targeted selectivity profile.

  1. Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound–Kinase Activities: A Way toward Selective Promiscuity by Design?

    PubMed Central

    2016-01-01

    Drug discovery programs frequently target members of the human kinome and try to identify small molecule protein kinase inhibitors, primarily for cancer treatment, additional indications being increasingly investigated. One of the challenges is controlling the inhibitors degree of selectivity, assessed by in vitro profiling against panels of protein kinases. We manually extracted, compiled, and standardized such profiles published in the literature: we collected 356 908 data points corresponding to 482 protein kinases, 2106 inhibitors, and 661 patents. We then analyzed this data set in terms of kinome coverage, results reproducibility, popularity, and degree of selectivity of both kinases and inhibitors. We used the data set to create robust proteochemometric models capable of predicting kinase activity (the ligand–target space was modeled with an externally validated RMSE of 0.41 ± 0.02 log units and R02 0.74 ± 0.03), in order to account for missing or unreliable measurements. The influence on the prediction quality of parameters such as number of measurements, Murcko scaffold frequency or inhibitor type was assessed. Interpretation of the models enabled to highlight inhibitors and kinases properties correlated with higher affinities, and an analysis in the context of kinases crystal structures was performed. Overall, the models quality allows the accurate prediction of kinase-inhibitor activities and their structural interpretation, thus paving the way for the rational design of compounds with a targeted selectivity profile. PMID:27482722

  2. Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound-Kinase Activities: A Way toward Selective Promiscuity by Design?

    PubMed

    Christmann-Franck, Serge; van Westen, Gerard J P; Papadatos, George; Beltran Escudie, Fanny; Roberts, Alexander; Overington, John P; Domine, Daniel

    2016-09-26

    Drug discovery programs frequently target members of the human kinome and try to identify small molecule protein kinase inhibitors, primarily for cancer treatment, additional indications being increasingly investigated. One of the challenges is controlling the inhibitors degree of selectivity, assessed by in vitro profiling against panels of protein kinases. We manually extracted, compiled, and standardized such profiles published in the literature: we collected 356 908 data points corresponding to 482 protein kinases, 2106 inhibitors, and 661 patents. We then analyzed this data set in terms of kinome coverage, results reproducibility, popularity, and degree of selectivity of both kinases and inhibitors. We used the data set to create robust proteochemometric models capable of predicting kinase activity (the ligand-target space was modeled with an externally validated RMSE of 0.41 ± 0.02 log units and R02 0.74 ± 0.03), in order to account for missing or unreliable measurements. The influence on the prediction quality of parameters such as number of measurements, Murcko scaffold frequency or inhibitor type was assessed. Interpretation of the models enabled to highlight inhibitors and kinases properties correlated with higher affinities, and an analysis in the context of kinases crystal structures was performed. Overall, the models quality allows the accurate prediction of kinase-inhibitor activities and their structural interpretation, thus paving the way for the rational design of compounds with a targeted selectivity profile. PMID:27482722

  3. Robust and Accurate Modeling Approaches for Migraine Per-Patient Prediction from Ambulatory Data

    PubMed Central

    Pagán, Josué; Irene De Orbe, M.; Gago, Ana; Sobrado, Mónica; Risco-Martín, José L.; Vivancos Mora, J.; Moya, José M.; Ayala, José L.

    2015-01-01

    Migraine is one of the most wide-spread neurological disorders, and its medical treatment represents a high percentage of the costs of health systems. In some patients, characteristic symptoms that precede the headache appear. However, they are nonspecific, and their prediction horizon is unknown and pretty variable; hence, these symptoms are almost useless for prediction, and they are not useful to advance the intake of drugs to be effective and neutralize the pain. To solve this problem, this paper sets up a realistic monitoring scenario where hemodynamic variables from real patients are monitored in ambulatory conditions with a wireless body sensor network (WBSN). The acquired data are used to evaluate the predictive capabilities and robustness against noise and failures in sensors of several modeling approaches. The obtained results encourage the development of per-patient models based on state-space models (N4SID) that are capable of providing average forecast windows of 47 min and a low rate of false positives. PMID:26134103

  4. Revisiting the blind tests in crystal structure prediction: accurate energy ranking of molecular crystals.

    PubMed

    Asmadi, Aldi; Neumann, Marcus A; Kendrick, John; Girard, Pascale; Perrin, Marc-Antoine; Leusen, Frank J J

    2009-12-24

    In the 2007 blind test of crystal structure prediction hosted by the Cambridge Crystallographic Data Centre (CCDC), a hybrid DFT/MM method correctly ranked each of the four experimental structures as having the lowest lattice energy of all the crystal structures predicted for each molecule. The work presented here further validates this hybrid method by optimizing the crystal structures (experimental and submitted) of the first three CCDC blind tests held in 1999, 2001, and 2004. Except for the crystal structures of compound IX, all structures were reminimized and ranked according to their lattice energies. The hybrid method computes the lattice energy of a crystal structure as the sum of the DFT total energy and a van der Waals (dispersion) energy correction. Considering all four blind tests, the crystal structure with the lowest lattice energy corresponds to the experimentally observed structure for 12 out of 14 molecules. Moreover, good geometrical agreement is observed between the structures determined by the hybrid method and those measured experimentally. In comparison with the correct submissions made by the blind test participants, all hybrid optimized crystal structures (apart from compound II) have the smallest calculated root mean squared deviations from the experimentally observed structures. It is predicted that a new polymorph of compound V exists under pressure.

  5. Fast and accurate numerical method for predicting gas chromatography retention time.

    PubMed

    Claumann, Carlos Alberto; Wüst Zibetti, André; Bolzan, Ariovaldo; Machado, Ricardo A F; Pinto, Leonel Teixeira

    2015-08-01

    Predictive modeling for gas chromatography compound retention depends on the retention factor (ki) and on the flow of the mobile phase. Thus, different approaches for determining an analyte ki in column chromatography have been developed. The main one is based on the thermodynamic properties of the component and on the characteristics of the stationary phase. These models can be used to estimate the parameters and to optimize the programming of temperatures, in gas chromatography, for the separation of compounds. Different authors have proposed the use of numerical methods for solving these models, but these methods demand greater computational time. Hence, a new method for solving the predictive modeling of analyte retention time is presented. This algorithm is an alternative to traditional methods because it transforms its attainments into root determination problems within defined intervals. The proposed approach allows for tr calculation, with accuracy determined by the user of the methods, and significant reductions in computational time; it can also be used to evaluate the performance of other prediction methods.

  6. Accurate structure prediction of peptide–MHC complexes for identifying highly immunogenic antigens

    SciTech Connect

    Park, Min-Sun; Park, Sung Yong; Miller, Keith R.; Collins, Edward J.; Lee, Ha Youn

    2013-11-01

    Designing an optimal HIV-1 vaccine faces the challenge of identifying antigens that induce a broad immune capacity. One factor to control the breadth of T cell responses is the surface morphology of a peptide–MHC complex. Here, we present an in silico protocol for predicting peptide–MHC structure. A robust signature of a conformational transition was identified during all-atom molecular dynamics, which results in a model with high accuracy. A large test set was used in constructing our protocol and we went another step further using a blind test with a wild-type peptide and two highly immunogenic mutants, which predicted substantial conformational changes in both mutants. The center residues at position five of the analogs were configured to be accessible to solvent, forming a prominent surface, while the residue of the wild-type peptide was to point laterally toward the side of the binding cleft. We then experimentally determined the structures of the blind test set, using high resolution of X-ray crystallography, which verified predicted conformational changes. Our observation strongly supports a positive association of the surface morphology of a peptide–MHC complex to its immunogenicity. Our study offers the prospect of enhancing immunogenicity of vaccines by identifying MHC binding immunogens.

  7. Revisiting the blind tests in crystal structure prediction: accurate energy ranking of molecular crystals.

    PubMed

    Asmadi, Aldi; Neumann, Marcus A; Kendrick, John; Girard, Pascale; Perrin, Marc-Antoine; Leusen, Frank J J

    2009-12-24

    In the 2007 blind test of crystal structure prediction hosted by the Cambridge Crystallographic Data Centre (CCDC), a hybrid DFT/MM method correctly ranked each of the four experimental structures as having the lowest lattice energy of all the crystal structures predicted for each molecule. The work presented here further validates this hybrid method by optimizing the crystal structures (experimental and submitted) of the first three CCDC blind tests held in 1999, 2001, and 2004. Except for the crystal structures of compound IX, all structures were reminimized and ranked according to their lattice energies. The hybrid method computes the lattice energy of a crystal structure as the sum of the DFT total energy and a van der Waals (dispersion) energy correction. Considering all four blind tests, the crystal structure with the lowest lattice energy corresponds to the experimentally observed structure for 12 out of 14 molecules. Moreover, good geometrical agreement is observed between the structures determined by the hybrid method and those measured experimentally. In comparison with the correct submissions made by the blind test participants, all hybrid optimized crystal structures (apart from compound II) have the smallest calculated root mean squared deviations from the experimentally observed structures. It is predicted that a new polymorph of compound V exists under pressure. PMID:19950907

  8. Protein function prediction using neighbor relativity in protein-protein interaction network.

    PubMed

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network.

  9. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    SciTech Connect

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-28

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  10. Accurate predictions of dielectrophoretic force and torque on particles with strong mutual field, particle, and wall interactions

    NASA Astrophysics Data System (ADS)

    Liu, Qianlong; Reifsnider, Kenneth

    2012-11-01

    The basis of dielectrophoresis (DEP) is the prediction of the force and torque on particles. The classical approach to the prediction is based on the effective moment method, which, however, is an approximate approach, assumes infinitesimal particles. Therefore, it is well-known that for finite-sized particles, the DEP approximation is inaccurate as the mutual field, particle, wall interactions become strong, a situation presently attracting extensive research for practical significant applications. In the present talk, we provide accurate calculations of the force and torque on the particles from first principles, by directly resolving the local geometry and properties and accurately accounting for the mutual interactions for finite-sized particles with both dielectric polarization and conduction in a sinusoidally steady-state electric field. Since the approach has a significant advantage, compared to other numerical methods, to efficiently simulate many closely packed particles, it provides an important, unique, and accurate technique to investigate complex DEP phenomena, for example heterogeneous mixtures containing particle chains, nanoparticle assembly, biological cells, non-spherical effects, etc. This study was supported by the Department of Energy under funding for an EFRC (the HeteroFoaM Center), grant no. DE-SC0001061.

  11. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    NASA Astrophysics Data System (ADS)

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-01

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  12. The Compensatory Reserve For Early and Accurate Prediction Of Hemodynamic Compromise: A Review of the Underlying Physiology.

    PubMed

    Convertino, Victor A; Wirt, Michael D; Glenn, John F; Lein, Brian C

    2016-06-01

    Shock is deadly and unpredictable if it is not recognized and treated in early stages of hemorrhage. Unfortunately, measurements of standard vital signs that are displayed on current medical monitors fail to provide accurate or early indicators of shock because of physiological mechanisms that effectively compensate for blood loss. As a result of new insights provided by the latest research on the physiology of shock using human experimental models of controlled hemorrhage, it is now recognized that measurement of the body's reserve to compensate for reduced circulating blood volume is the single most important indicator for early and accurate assessment of shock. We have called this function the "compensatory reserve," which can be accurately assessed by real-time measurements of changes in the features of the arterial waveform. In this paper, the physiology underlying the development and evaluation of a new noninvasive technology that allows for real-time measurement of the compensatory reserve will be reviewed, with its clinical implications for earlier and more accurate prediction of shock. PMID:26950588

  13. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins

    PubMed Central

    Karp, Jerome M.; Erylimaz, Ertan

    2015-01-01

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder. PMID:25416617

  14. A novel method to predict visual field progression more accurately, using intraocular pressure measurements in glaucoma patients

    PubMed Central

    Asaoka, Ryo; Fujino, Yuri; Murata, Hiroshi; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Shoji, Nobuyuki

    2016-01-01

    Visual field (VF) data were retrospectively obtained from 491 eyes in 317 patients with open angle glaucoma who had undergone ten VF tests (Humphrey Field Analyzer, 24-2, SITA standard). First, mean of total deviation values (mTD) in the tenth VF was predicted using standard linear regression of the first five VFs (VF1-5) through to using all nine preceding VFs (VF1-9). Then an ‘intraocular pressure (IOP)-integrated VF trend analysis’ was carried out by simply using time multiplied by IOP as the independent term in the linear regression model. Prediction errors (absolute prediction error or root mean squared error: RMSE) for predicting mTD and also point wise TD values of the tenth VF were obtained from both approaches. The mTD absolute prediction errors associated with the IOP-integrated VF trend analysis were significantly smaller than those from the standard trend analysis when VF1-6 through to VF1-8 were used (p < 0.05). The point wise RMSEs from the IOP-integrated trend analysis were significantly smaller than those from the standard trend analysis when VF1-5 through to VF1-9 were used (p < 0.05). This was especially the case when IOP was measured more frequently. Thus a significantly more accurate prediction of VF progression is possible using a simple trend analysis that incorporates IOP measurements. PMID:27562553

  15. A novel method to predict visual field progression more accurately, using intraocular pressure measurements in glaucoma patients.

    PubMed

    2016-01-01

    Visual field (VF) data were retrospectively obtained from 491 eyes in 317 patients with open angle glaucoma who had undergone ten VF tests (Humphrey Field Analyzer, 24-2, SITA standard). First, mean of total deviation values (mTD) in the tenth VF was predicted using standard linear regression of the first five VFs (VF1-5) through to using all nine preceding VFs (VF1-9). Then an 'intraocular pressure (IOP)-integrated VF trend analysis' was carried out by simply using time multiplied by IOP as the independent term in the linear regression model. Prediction errors (absolute prediction error or root mean squared error: RMSE) for predicting mTD and also point wise TD values of the tenth VF were obtained from both approaches. The mTD absolute prediction errors associated with the IOP-integrated VF trend analysis were significantly smaller than those from the standard trend analysis when VF1-6 through to VF1-8 were used (p < 0.05). The point wise RMSEs from the IOP-integrated trend analysis were significantly smaller than those from the standard trend analysis when VF1-5 through to VF1-9 were used (p < 0.05). This was especially the case when IOP was measured more frequently. Thus a significantly more accurate prediction of VF progression is possible using a simple trend analysis that incorporates IOP measurements. PMID:27562553

  16. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  17. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  18. nuMap: a web platform for accurate prediction of nucleosome positioning.

    PubMed

    Alharbi, Bader A; Alshammari, Thamir H; Felton, Nathan L; Zhurkin, Victor B; Cui, Feng

    2014-10-01

    Nucleosome positioning is critical for gene expression and of major biological interest. The high cost of experimentally mapping nucleosomal arrangement signifies the need for computational approaches to predict nucleosome positions at high resolution. Here, we present a web-based application to fulfill this need by implementing two models, YR and W/S schemes, for the translational and rotational positioning of nucleosomes, respectively. Our methods are based on sequence-dependent anisotropic bending that dictates how DNA is wrapped around a histone octamer. This application allows users to specify a number of options such as schemes and parameters for threading calculation and provides multiple layout formats. The nuMap is implemented in Java/Perl/MySQL and is freely available for public use at http://numap.rit.edu. The user manual, implementation notes, description of the methodology and examples are available at the site. PMID:25220945

  19. A Foundation for the Accurate Prediction of the Soft Error Vulnerability of Scientific Applications

    SciTech Connect

    Bronevetsky, G; de Supinski, B; Schulz, M

    2009-02-13

    Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientific applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. This paper analyzes the vulnerability of these routines to soft errors by characterizing how their outputs are affected by injected errors and by evaluating several techniques for predicting how errors propagate from the input to the output of each routine. The resulting error profiles can be used to understand the fault vulnerability of full applications that use these routines.

  20. Simplified versus geometrically accurate models of forefoot anatomy to predict plantar pressures: A finite element study.

    PubMed

    Telfer, Scott; Erdemir, Ahmet; Woodburn, James; Cavanagh, Peter R

    2016-01-25

    Integration of patient-specific biomechanical measurements into the design of therapeutic footwear has been shown to improve clinical outcomes in patients with diabetic foot disease. The addition of numerical simulations intended to optimise intervention design may help to build on these advances, however at present the time and labour required to generate and run personalised models of foot anatomy restrict their routine clinical utility. In this study we developed second-generation personalised simple finite element (FE) models of the forefoot with varying geometric fidelities. Plantar pressure predictions from barefoot, shod, and shod with insole simulations using simplified models were compared to those obtained from CT-based FE models incorporating more detailed representations of bone and tissue geometry. A simplified model including representations of metatarsals based on simple geometric shapes, embedded within a contoured soft tissue block with outer geometry acquired from a 3D surface scan was found to provide pressure predictions closest to the more complex model, with mean differences of 13.3kPa (SD 13.4), 12.52kPa (SD 11.9) and 9.6kPa (SD 9.3) for barefoot, shod, and insole conditions respectively. The simplified model design could be produced in <1h compared to >3h in the case of the more detailed model, and solved on average 24% faster. FE models of the forefoot based on simplified geometric representations of the metatarsal bones and soft tissue surface geometry from 3D surface scans may potentially provide a simulation approach with improved clinical utility, however further validity testing around a range of therapeutic footwear types is required.

  1. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    MedlinePlus

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  2. Nonempirically Tuned Range-Separated DFT Accurately Predicts Both Fundamental and Excitation Gaps in DNA and RNA Nucleobases

    PubMed Central

    2012-01-01

    Using a nonempirically tuned range-separated DFT approach, we study both the quasiparticle properties (HOMO–LUMO fundamental gaps) and excitation energies of DNA and RNA nucleobases (adenine, thymine, cytosine, guanine, and uracil). Our calculations demonstrate that a physically motivated, first-principles tuned DFT approach accurately reproduces results from both experimental benchmarks and more computationally intensive techniques such as many-body GW theory. Furthermore, in the same set of nucleobases, we show that the nonempirical range-separated procedure also leads to significantly improved results for excitation energies compared to conventional DFT methods. The present results emphasize the importance of a nonempirically tuned range-separation approach for accurately predicting both fundamental and excitation gaps in DNA and RNA nucleobases. PMID:22904693

  3. Documentation of an Imperative To Improve Methods for Predicting Membrane Protein Stability.

    PubMed

    Kroncke, Brett M; Duran, Amanda M; Mendenhall, Jeffrey L; Meiler, Jens; Blume, Jeffrey D; Sanders, Charles R

    2016-09-13

    There is a compelling and growing need to accurately predict the impact of amino acid mutations on protein stability for problems in personalized medicine and other applications. Here the ability of 10 computational tools to accurately predict mutation-induced perturbation of folding stability (ΔΔG) for membrane proteins of known structure was assessed. All methods for predicting ΔΔG values performed significantly worse when applied to membrane proteins than when applied to soluble proteins, yielding estimated concordance, Pearson, and Spearman correlation coefficients of <0.4 for membrane proteins. Rosetta and PROVEAN showed a modest ability to classify mutations as destabilizing (ΔΔG < -0.5 kcal/mol), with a 7 in 10 chance of correctly discriminating a randomly chosen destabilizing variant from a randomly chosen stabilizing variant. However, even this performance is significantly worse than for soluble proteins. This study highlights the need for further development of reliable and reproducible methods for predicting thermodynamic folding stability in membrane proteins. PMID:27564391

  4. Lateral impact validation of a geometrically accurate full body finite element model for blunt injury prediction.

    PubMed

    Vavalle, Nicholas A; Moreno, Daniel P; Rhyne, Ashley C; Stitzel, Joel D; Gayzik, F Scott

    2013-03-01

    This study presents four validation cases of a mid-sized male (M50) full human body finite element model-two lateral sled tests at 6.7 m/s, one sled test at 8.9 m/s, and a lateral drop test. Model results were compared to transient force curves, peak force, chest compression, and number of fractures from the studies. For one of the 6.7 m/s impacts (flat wall impact), the peak thoracic, abdominal and pelvic loads were 8.7, 3.1 and 14.9 kN for the model and 5.2 ± 1.1 kN, 3.1 ± 1.1 kN, and 6.3 ± 2.3 kN for the tests. For the same test setup in the 8.9 m/s case, they were 12.6, 6, and 21.9 kN for the model and 9.1 ± 1.5 kN, 4.9 ± 1.1 kN, and 17.4 ± 6.8 kN for the experiments. The combined torso load and the pelvis load simulated in a second rigid wall impact at 6.7 m/s were 11.4 and 15.6 kN, respectively, compared to 8.5 ± 0.2 kN and 8.3 ± 1.8 kN experimentally. The peak thorax load in the drop test was 6.7 kN for the model, within the range in the cadavers, 5.8-7.4 kN. When analyzing rib fractures, the model predicted Abbreviated Injury Scale scores within the reported range in three of four cases. Objective comparison methods were used to quantitatively compare the model results to the literature studies. The results show a good match in the thorax and abdomen regions while the pelvis results over predicted the reaction loads from the literature studies. These results are an important milestone in the development and validation of this globally developed average male FEA model in lateral impact.

  5. Prediction of thermodynamic instabilities of protein solutions from simple protein-protein interactions

    NASA Astrophysics Data System (ADS)

    D'Agostino, Tommaso; Solana, José Ramón; Emanuele, Antonio

    2013-10-01

    Statistical thermodynamics of protein solutions is often studied in terms of simple, microscopic models of particles interacting via pairwise potentials. Such modelling can reproduce the short range structure of protein solutions at equilibrium and predict thermodynamics instabilities of these systems. We introduce a square well model of effective protein-protein interaction that embeds the solvent’s action. We modify an existing model [45] by considering a well depth having an explicit dependence on temperature, i.e. an explicit free energy character, thus encompassing the statistically relevant configurations of solvent molecules around proteins. We choose protein solutions exhibiting demixing upon temperature decrease (lysozyme, enthalpy driven) and upon temperature increase (haemoglobin, entropy driven). We obtain satisfactory fits of spinodal curves for both the two proteins without adding any mean field term, thus extending the validity of the original model. Our results underline the solvent role in modulating or stretching the interaction potential.

  6. Prediction of β-turn types in protein by using composite vector.

    PubMed

    Shi, Xiaobo; Hu, Xiuzhen; Li, Shaobo; Liu, Xingxing

    2011-10-01

    Protein secondary structure prediction is an intermediate step in the overall process of tertiary structure prediction. β-turns are important components of the secondary structure of a protein. Development of an accurate method of prediction of β-turn types would be helpful for predicting the overall tertiary structure of proteins. In this work, we constructed a database of 2805 protein chains. Our work improved the previous input parameters and used the support vector machine algorithm to predict the β-turn types; we obtained the overall prediction accuracy of 98.1%, 96.0%, 96.1%, 98.7%, 99.1%, 86.8%, 99.2% and 73.2% with the Matthews Correlation Coefficient values of 0.398, 0.460, 0.043, 0.463, 0.355, 0.172, 0.109 and 0.247, respectively, for types I, II, VIII, I', II', IV, VI and non-β-turn, respectively. In addition, we also used same method to predict the β-turn types in three databases of 426, 547 and 823 protein chains and found that our prediction results were better than other predictions.

  7. Accurate prediction of the refractive index of polymers using first principles and data modeling

    NASA Astrophysics Data System (ADS)

    Afzal, Mohammad Atif Faiz; Cheng, Chong; Hachmann, Johannes

    Organic polymers with a high refractive index (RI) have recently attracted considerable interest due to their potential application in optical and optoelectronic devices. The ability to tailor the molecular structure of polymers is the key to increasing the accessible RI values. Our work concerns the creation of predictive in silico models for the optical properties of organic polymers, the screening of large-scale candidate libraries, and the mining of the resulting data to extract the underlying design principles that govern their performance. This work was set up to guide our experimentalist partners and allow them to target the most promising candidates. Our model is based on the Lorentz-Lorenz equation and thus includes the polarizability and number density values for each candidate. For the former, we performed a detailed benchmark study of different density functionals, basis sets, and the extrapolation scheme towards the polymer limit. For the number density we devised an exceedingly efficient machine learning approach to correlate the polymer structure and the packing fraction in the bulk material. We validated the proposed RI model against the experimentally known RI values of 112 polymers. We could show that the proposed combination of physical and data modeling is both successful and highly economical to characterize a wide range of organic polymers, which is a prerequisite for virtual high-throughput screening.

  8. Accurate predictions of C-SO2R bond dissociation enthalpies using density functional theory methods.

    PubMed

    Yu, Hai-Zhu; Fu, Fang; Zhang, Liang; Fu, Yao; Dang, Zhi-Min; Shi, Jing

    2014-10-14

    The dissociation of the C-SO2R bond is frequently involved in organic and bio-organic reactions, and the C-SO2R bond dissociation enthalpies (BDEs) are potentially important for understanding the related mechanisms. The primary goal of the present study is to provide a reliable calculation method to predict the different C-SO2R bond dissociation enthalpies (BDEs). Comparing the accuracies of 13 different density functional theory (DFT) methods (such as B3LYP, TPSS, and M05 etc.), and different basis sets (such as 6-31G(d) and 6-311++G(2df,2p)), we found that M06-2X/6-31G(d) gives the best performance in reproducing the various C-S BDEs (and especially the C-SO2R BDEs). As an example for understanding the mechanisms with the aid of C-SO2R BDEs, some primary mechanistic studies were carried out on the chemoselective coupling (in the presence of a Cu-catalyst) or desulfinative coupling reactions (in the presence of a Pd-catalyst) between sulfinic acid salts and boryl/sulfinic acid salts.

  9. Towards Accurate Prediction of Turbulent, Three-Dimensional, Recirculating Flows with the NCC

    NASA Technical Reports Server (NTRS)

    Iannetti, A.; Tacina, R.; Jeng, S.-M.; Cai, J.

    2001-01-01

    The National Combustion Code (NCC) was used to calculate the steady state, nonreacting flow field of a prototype Lean Direct Injection (LDI) swirler. This configuration used nine groups of eight holes drilled at a thirty-five degree angle to induce swirl. These nine groups created swirl in the same direction, or a corotating pattern. The static pressure drop across the holes was fixed at approximately four percent. Computations were performed on one quarter of the geometry, because the geometry is considered rotationally periodic every ninety degrees. The final computational grid used was approximately 2.26 million tetrahedral cells, and a cubic nonlinear k - epsilon model was used to model turbulence. The NCC results were then compared to time averaged Laser Doppler Velocimetry (LDV) data. The LDV measurements were performed on the full geometry, but four ninths of the geometry was measured. One-, two-, and three-dimensional representations of both flow fields are presented. The NCC computations compare both qualitatively and quantitatively well to the LDV data, but differences exist downstream. The comparison is encouraging, and shows that NCC can be used for future injector design studies. To improve the flow prediction accuracy of turbulent, three-dimensional, recirculating flow fields with the NCC, recommendations are given.

  10. An improved method for accurate prediction of mass flows through combustor liner holes

    SciTech Connect

    Adkins, R.C.; Gueroui, D.

    1986-01-01

    The objective of this paper is to present a simple approach to the solution of flow through combustor liner holes which can be used by practicing combustor engineers as well as providing the specialist modeler with a convenient boundary condition. For modeling, suppose that all relevant details of the incoming jets can be readily predicted, then the computational boundary can be limited to the inner wall of the liner and to the jets themselves. The scope of this paper is limited to the derivation of a simple analysis, the development of a reliable test technique, and to the correlation of data for plane holes having a diameter which is large when compared to the liner wall thickness. The effect of internal liner flow on the performance of the holes is neglected; this is considered to be justifiable because the analysis terminates at a short distance downstream of the hole and the significantly lower velocities inside the combustor have had little opportunity to have taken any effect. It is intended to extend the procedure to more complex hole forms and flow configurations in later papers.

  11. Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies

    NASA Astrophysics Data System (ADS)

    Balabin, Roman M.; Lomakina, Ekaterina I.

    2009-08-01

    Artificial neural network (ANN) approach has been applied to estimate the density functional theory (DFT) energy with large basis set using lower-level energy values and molecular descriptors. A total of 208 different molecules were used for the ANN training, cross validation, and testing by applying BLYP, B3LYP, and BMK density functionals. Hartree-Fock results were reported for comparison. Furthermore, constitutional molecular descriptor (CD) and quantum-chemical molecular descriptor (QD) were used for building the calibration model. The neural network structure optimization, leading to four to five hidden neurons, was also carried out. The usage of several low-level energy values was found to greatly reduce the prediction error. An expected error, mean absolute deviation, for ANN approximation to DFT energies was 0.6±0.2 kcal mol-1. In addition, the comparison of the different density functionals with the basis sets and the comparison of multiple linear regression results were also provided. The CDs were found to overcome limitation of the QD. Furthermore, the effective ANN model for DFT/6-311G(3df,3pd) and DFT/6-311G(2df,2pd) energy estimation was developed, and the benchmark results were provided.

  12. Line Shape Parameters for CO_2 Transitions: Accurate Predictions from Complex Robert-Bonamy Calculations

    NASA Astrophysics Data System (ADS)

    Lamouroux, Julien; Gamache, Robert R.

    2013-06-01

    A model for the prediction of the vibrational dependence of CO_2 half-widths and line shifts for several broadeners, based on a modification of the model proposed by Gamache and Hartmann, is presented. This model allows the half-widths and line shifts for a ro-vibrational transition to be expressed in terms of the number of vibrational quanta exchanged in the transition raised to a power p and a reference ro-vibrational transition. Complex Robert-Bonamy calculations were made for 24 bands for lower rotational quantum numbers J'' from 0 to 160 for N_2-, O_2-, air-, and self-collisions with CO_2. In the model a Quantum Coordinate is defined by (c_1 Δν_1 + c_2 Δν_2 + c_3 Δν_3)^p where a linear least-squares fit to the data by the model expression is made. The model allows the determination of the slope and intercept as a function of rotational transition, broadening gas, and temperature. From these fit data, the half-width, line shift, and the temperature dependence of the half-width can be estimated for any ro-vibrational transition, allowing spectroscopic CO_2 databases to have complete information for the line shape parameters. R. R. Gamache, J.-M. Hartmann, J. Quant. Spectrosc. Radiat. Transfer. {{83}} (2004), 119. R. R. Gamache, J. Lamouroux, J. Quant. Spectrosc. Radiat. Transfer. {{117}} (2013), 93.

  13. The development and verification of a highly accurate collision prediction model for automated noncoplanar plan delivery

    PubMed Central

    Yu, Victoria Y.; Tran, Angelia; Nguyen, Dan; Cao, Minsong; Ruan, Dan; Low, Daniel A.; Sheng, Ke

    2015-01-01

    attributed to phantom setup errors due to the slightly deformable and flexible phantom extremities. The estimated site-specific safety buffer distance with 0.001% probability of collision for (gantry-to-couch, gantry-to-phantom) was (1.23 cm, 3.35 cm), (1.01 cm, 3.99 cm), and (2.19 cm, 5.73 cm) for treatment to the head, lung, and prostate, respectively. Automated delivery to all three treatment sites was completed in 15 min and collision free using a digital Linac. Conclusions: An individualized collision prediction model for the purpose of noncoplanar beam delivery was developed and verified. With the model, the study has demonstrated the feasibility of predicting deliverable beams for an individual patient and then guiding fully automated noncoplanar treatment delivery. This work motivates development of clinical workflows and quality assurance procedures to allow more extensive use and automation of noncoplanar beam geometries. PMID:26520735

  14. The development and verification of a highly accurate collision prediction model for automated noncoplanar plan delivery

    SciTech Connect

    Yu, Victoria Y.; Tran, Angelia; Nguyen, Dan; Cao, Minsong; Ruan, Dan; Low, Daniel A.; Sheng, Ke

    2015-11-15

    attributed to phantom setup errors due to the slightly deformable and flexible phantom extremities. The estimated site-specific safety buffer distance with 0.001% probability of collision for (gantry-to-couch, gantry-to-phantom) was (1.23 cm, 3.35 cm), (1.01 cm, 3.99 cm), and (2.19 cm, 5.73 cm) for treatment to the head, lung, and prostate, respectively. Automated delivery to all three treatment sites was completed in 15 min and collision free using a digital Linac. Conclusions: An individualized collision prediction model for the purpose of noncoplanar beam delivery was developed and verified. With the model, the study has demonstrated the feasibility of predicting deliverable beams for an individual patient and then guiding fully automated noncoplanar treatment delivery. This work motivates development of clinical workflows and quality assurance procedures to allow more extensive use and automation of noncoplanar beam geometries.

  15. Accurate single-day titration of adenovirus vectors based on equivalence of protein VII nuclear dots and infectious particles.

    PubMed

    Walkiewicz, Marcin P; Morral, Nuria; Engel, Daniel A

    2009-08-01

    Protein VII is an abundant component of adenovirus particles and is tightly associated with the viral DNA. It enters the nucleus along with the infecting viral genome and remains bound throughout early phase. Protein VII can be visualized by immunofluorescent staining as discrete dots in the infected cell nucleus. Comparison between protein VII staining and expression of the 72kDa DNA-binding protein revealed a one-to-one correspondence between protein VII dots and infectious viral genomes. A similar relationship was observed for a helper-dependent adenovirus vector expressing green fluorescent protein. This relationship allowed accurate titration of adenovirus preparations, including wild-type and helper-dependent vectors, using a 1-day immunofluorescence method. The method can be applied to any adenovirus vector and gives results equivalent to the standard plaque assay.

  16. Accurate single-day titration of adenovirus vectors based on equivalence of protein VII nuclear dots and infectious particles.

    PubMed

    Walkiewicz, Marcin P; Morral, Nuria; Engel, Daniel A

    2009-08-01

    Protein VII is an abundant component of adenovirus particles and is tightly associated with the viral DNA. It enters the nucleus along with the infecting viral genome and remains bound throughout early phase. Protein VII can be visualized by immunofluorescent staining as discrete dots in the infected cell nucleus. Comparison between protein VII staining and expression of the 72kDa DNA-binding protein revealed a one-to-one correspondence between protein VII dots and infectious viral genomes. A similar relationship was observed for a helper-dependent adenovirus vector expressing green fluorescent protein. This relationship allowed accurate titration of adenovirus preparations, including wild-type and helper-dependent vectors, using a 1-day immunofluorescence method. The method can be applied to any adenovirus vector and gives results equivalent to the standard plaque assay. PMID:19406166

  17. How Accurate Are the Anthropometry Equations in in Iranian Military Men in Predicting Body Composition?

    PubMed Central

    Shakibaee, Abolfazl; Faghihzadeh, Soghrat; Alishiri, Gholam Hossein; Ebrahimpour, Zeynab; Faradjzadeh, Shahram; Sobhani, Vahid; Asgari, Alireza

    2015-01-01

    Background: The body composition varies according to different life styles (i.e. intake calories and caloric expenditure). Therefore, it is wise to record military personnel’s body composition periodically and encourage those who abide to the regulations. Different methods have been introduced for body composition assessment: invasive and non-invasive. Amongst them, the Jackson and Pollock equation is most popular. Objectives: The recommended anthropometric prediction equations for assessing men’s body composition were compared with dual-energy X-ray absorptiometry (DEXA) gold standard to develop a modified equation to assess body composition and obesity quantitatively among Iranian military men. Patients and Methods: A total of 101 military men aged 23 - 52 years old with a mean age of 35.5 years were recruited and evaluated in the present study (average height, 173.9 cm and weight, 81.5 kg). The body-fat percentages of subjects were assessed both with anthropometric assessment and DEXA scan. The data obtained from these two methods were then compared using multiple regression analysis. Results: The mean and standard deviation of body fat percentage of the DEXA assessment was 21.2 ± 4.3 and body fat percentage obtained from three Jackson and Pollock 3-, 4- and 7-site equations were 21.1 ± 5.8, 22.2 ± 6.0 and 20.9 ± 5.7, respectively. There was a strong correlation between these three equations and DEXA (R² = 0.98). Conclusions: The mean percentage of body fat obtained from the three equations of Jackson and Pollock was very close to that of body fat obtained from DEXA; however, we suggest using a modified Jackson-Pollock 3-site equation for volunteer military men because the 3-site equation analysis method is simpler and faster than other methods. PMID:26715964

  18. Industrial Compositional Streamline Simulation for Efficient and Accurate Prediction of Gas Injection and WAG Processes

    SciTech Connect

    Margot Gerritsen

    2008-10-31

    Gas-injection processes are widely and increasingly used for enhanced oil recovery (EOR). In the United States, for example, EOR production by gas injection accounts for approximately 45% of total EOR production and has tripled since 1986. The understanding of the multiphase, multicomponent flow taking place in any displacement process is essential for successful design of gas-injection projects. Due to complex reservoir geometry, reservoir fluid properties and phase behavior, the design of accurate and efficient numerical simulations for the multiphase, multicomponent flow governing these processes is nontrivial. In this work, we developed, implemented and tested a streamline based solver for gas injection processes that is computationally very attractive: as compared to traditional Eulerian solvers in use by industry it computes solutions with a computational speed orders of magnitude higher and a comparable accuracy provided that cross-flow effects do not dominate. We contributed to the development of compositional streamline solvers in three significant ways: improvement of the overall framework allowing improved streamline coverage and partial streamline tracing, amongst others; parallelization of the streamline code, which significantly improves wall clock time; and development of new compositional solvers that can be implemented along streamlines as well as in existing Eulerian codes used by industry. We designed several novel ideas in the streamline framework. First, we developed an adaptive streamline coverage algorithm. Adding streamlines locally can reduce computational costs by concentrating computational efforts where needed, and reduce mapping errors. Adapting streamline coverage effectively controls mass balance errors that mostly result from the mapping from streamlines to pressure grid. We also introduced the concept of partial streamlines: streamlines that do not necessarily start and/or end at wells. This allows more efficient coverage and avoids

  19. Deformation, Failure, and Fatigue Life of SiC/Ti-15-3 Laminates Accurately Predicted by MAC/GMC

    NASA Technical Reports Server (NTRS)

    Bednarcyk, Brett A.; Arnold, Steven M.

    2002-01-01

    NASA Glenn Research Center's Micromechanics Analysis Code with Generalized Method of Cells (MAC/GMC) (ref.1) has been extended to enable fully coupled macro-micro deformation, failure, and fatigue life predictions for advanced metal matrix, ceramic matrix, and polymer matrix composites. Because of the multiaxial nature of the code's underlying micromechanics model, GMC--which allows the incorporation of complex local inelastic constitutive models--MAC/GMC finds its most important application in metal matrix composites, like the SiC/Ti-15-3 composite examined here. Furthermore, since GMC predicts the microscale fields within each constituent of the composite material, submodels for local effects such as fiber breakage, interfacial debonding, and matrix fatigue damage can and have been built into MAC/GMC. The present application of MAC/GMC highlights the combination of these features, which has enabled the accurate modeling of the deformation, failure, and life of titanium matrix composites.

  20. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases

    PubMed Central

    Floden, Evan W.; Tommaso, Paolo D.; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-01-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  1. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee.

  2. Absolute Measurements of Macrophage Migration Inhibitory Factor and Interleukin-1-β mRNA Levels Accurately Predict Treatment Response in Depressed Patients

    PubMed Central

    Ferrari, Clarissa; Uher, Rudolf; Bocchio-Chiavetto, Luisella; Riva, Marco Andrea; Pariante, Carmine M.

    2016-01-01

    Background: Increased levels of inflammation have been associated with a poorer response to antidepressants in several clinical samples, but these findings have had been limited by low reproducibility of biomarker assays across laboratories, difficulty in predicting response probability on an individual basis, and unclear molecular mechanisms. Methods: Here we measured absolute mRNA values (a reliable quantitation of number of molecules) of Macrophage Migration Inhibitory Factor and interleukin-1β in a previously published sample from a randomized controlled trial comparing escitalopram vs nortriptyline (GENDEP) as well as in an independent, naturalistic replication sample. We then used linear discriminant analysis to calculate mRNA values cutoffs that best discriminated between responders and nonresponders after 12 weeks of antidepressants. As Macrophage Migration Inhibitory Factor and interleukin-1β might be involved in different pathways, we constructed a protein-protein interaction network by the Search Tool for the Retrieval of Interacting Genes/Proteins. Results: We identified cutoff values for the absolute mRNA measures that accurately predicted response probability on an individual basis, with positive predictive values and specificity for nonresponders of 100% in both samples (negative predictive value=82% to 85%, sensitivity=52% to 61%). Using network analysis, we identified different clusters of targets for these 2 cytokines, with Macrophage Migration Inhibitory Factor interacting predominantly with pathways involved in neurogenesis, neuroplasticity, and cell proliferation, and interleukin-1β interacting predominantly with pathways involved in the inflammasome complex, oxidative stress, and neurodegeneration. Conclusion: We believe that these data provide a clinically suitable approach to the personalization of antidepressant therapy: patients who have absolute mRNA values above the suggested cutoffs could be directed toward earlier access to more

  3. The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

    PubMed Central

    Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.

    2007-01-01

    The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268

  4. A cross-race effect in metamemory: Predictions of face recognition are more accurate for members of our own race.

    PubMed

    Hourihan, Kathleen L; Benjamin, Aaron S; Liu, Xiping

    2012-09-01

    The Cross-Race Effect (CRE) in face recognition is the well-replicated finding that people are better at recognizing faces from their own race, relative to other races. The CRE reveals systematic limitations on eyewitness identification accuracy and suggests that some caution is warranted in evaluating cross-race identification. The CRE is a problem because jurors value eyewitness identification highly in verdict decisions. In the present paper, we explore how accurate people are in predicting their ability to recognize own-race and other-race faces. Caucasian and Asian participants viewed photographs of Caucasian and Asian faces, and made immediate judgments of learning during study. An old/new recognition test replicated the CRE: both groups displayed superior discriminability of own-race faces, relative to other-race faces. Importantly, relative metamnemonic accuracy was also greater for own-race faces, indicating that the accuracy of predictions about face recognition is influenced by race. This result indicates another source of concern when eliciting or evaluating eyewitness identification: people are less accurate in judging whether they will or will not recognize a face when that face is of a different race than they are. This new result suggests that a witness's claim of being likely to recognize a suspect from a lineup should be interpreted with caution when the suspect is of a different race than the witness.

  5. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed.

  6. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. PMID:26121186

  7. Why don't we learn to accurately forecast feelings? How misremembering our predictions blinds us to past forecasting errors.

    PubMed

    Meyvis, Tom; Ratner, Rebecca K; Levav, Jonathan

    2010-11-01

    Why do affective forecasting errors persist in the face of repeated disconfirming evidence? Five studies demonstrate that people misremember their forecasts as consistent with their experience and thus fail to perceive the extent of their forecasting error. As a result, people do not learn from past forecasting errors and fail to adjust subsequent forecasts. In the context of a Super Bowl loss (Study 1), a presidential election (Studies 2 and 3), an important purchase (Study 4), and the consumption of candies (Study 5), individuals mispredicted their affective reactions to these experiences and subsequently misremembered their predictions as more accurate than they actually had been. The findings indicate that this recall error results from people's tendency to anchor on their current affective state when trying to recall their affective forecasts. Further, those who showed larger recall errors were less likely to learn to adjust their subsequent forecasts and reminding people of their actual forecasts enhanced learning. These results suggest that a failure to accurately recall one's past predictions contributes to the perpetuation of forecasting errors.

  8. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  9. Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

    PubMed Central

    2010-01-01

    Background Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. Results This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. Conclusions Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information. PMID:20361868

  10. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  11. Different combinations of atomic interactions predict protein-small molecule and protein-DNA/RNA affinities with similar accuracy.

    PubMed

    Dias, Raquel; Kolazckowski, Bryan

    2015-11-01

    Interactions between proteins and other molecules play essential roles in all biological processes. Although it is widely held that a protein's ligand specificity is determined primarily by its three-dimensional structure, the general principles by which structure determines ligand binding remain poorly understood. Here we use statistical analyses of a large number of protein-ligand complexes with associated binding-affinity measurements to quantitatively characterize how combinations of atomic interactions contribute to ligand affinity. We find that there are significant differences in how atomic interactions determine ligand affinity for proteins that bind small chemical ligands, those that bind DNA/RNA and those that interact with other proteins. Although protein-small molecule and protein-DNA/RNA binding affinities can be accurately predicted from structural data, models predicting one type of interaction perform poorly on the others. Additionally, the particular combinations of atomic interactions required to predict binding affinity differed between small-molecule and DNA/RNA data sets, consistent with the conclusion that the structural bases determining ligand affinity differ among interaction types. In contrast to what we observed for small-molecule and DNA/RNA interactions, no statistical models were capable of predicting protein-protein affinity with >60% correlation. We demonstrate the potential usefulness of protein-DNA/RNA binding prediction as a possible tool for high-throughput virtual screening to guide laboratory investigations, suggesting that quantitative characterization of diverse molecular interactions may have practical applications as well as fundamentally advancing our understanding of how molecular structure translates into function.

  12. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

    PubMed Central

    Hayat, Sikander; Sander, Chris; Marks, Debora S.

    2015-01-01

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases. PMID:25858953

  13. Binding affinity prediction for protein-ligand complexes based on β contacts and B factor.

    PubMed

    Liu, Qian; Kwoh, Chee Keong; Li, Jinyan

    2013-11-25

    Accurate determination of protein-ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein-ligand docking. A number of scoring functions have been proposed for the prediction of protein-ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation under the leave-one-cluster-out cross-validation (LCOCV). We introduce a new scoring function named B2BScore to improve the prediction performance. B2BScore integrates two physicochemical properties for protein-ligand binding affinity prediction. One is the property of β contacts. A β contact between two atoms requires no other atoms to interrupt the atomic contact and assumes that the two atoms should have enough direct contact area. The other is the property of B factor to capture the atomic mobility in the dynamic protein-ligand binding process. Tested on the PDBBind2009 data set, B2BScore shows superior prediction performance to existing methods on independent test data as well as under the LCOCV evaluation framework. In particular, B2BScore achieves a significant LCOCV improvement across 26 protein clusters-a big increase of the averaged Pearson's correlation coefficients from 0.418 to 0.518 and a significant decrease of standard deviation of the coefficients from 0.352 to 0.196. We also identified several important and intuitive contact descriptors of protein-ligand binding through the random forest learning in B2BScore. Some of these descriptors are closely related to contacts between carbon atoms without covalent-bond oxygen/nitrogen, preferred contacts of metal ions, interfacial backbone atoms from proteins, or π rings. Some others are negative descriptors relating to those contacts with nitrogen atoms without covalent-bond hydrogens or nonpreferred contacts of metal ions. These descriptors can be directly used to guide protein-ligand docking.

  14. The 82-plex plasma protein signature that predicts increasing inflammation

    PubMed Central

    Tepel, Martin; Beck, Hans C.; Tan, Qihua; Borst, Christoffer; Rasmussen, Lars M.

    2015-01-01

    The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney transplant recipients and quantified 359 plasma proteins simultaneously using nano-Liquid-Chromatography-Tandem Mass-Spectrometry in individual samples and plasma C-reactive protein on the index day and the next day. Next-day C-reactive protein increased in 59 patients whereas it decreased in 32 patients. The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P < 0.0001). Multivariable logistic regression showed that 82-plex protein signature (P < 0.001) was associated with observed increased next-day C-reactive protein. The 82-plex protein signature outperformed routine clinical procedures. The category-free net reclassification index improved with 82-plex plasma protein signature (total net reclassification index, 88.3%). Using the 82-plex plasma protein signature increased net reclassification index with a clinical meaningful 10% increase of risk mainly by the improvement of reclassification of subjects in the event group. An 82-plex plasma protein signature predicts an increase of the inflammatory marker C-reactive protein. PMID:26445912

  15. Functional prediction of hypothetical proteins in human adenoviruses.

    PubMed

    Dorden, Shane; Mahadevan, Padmanabhan

    2015-01-01

    Assigning functional information to hypothetical proteins in virus genomes is crucial for gaining insight into their proteomes. Human adenoviruses are medium sized viruses that cause a range of diseases. Their genomes possess proteins with uncharacterized function known as hypothetical proteins. Using a wide range of protein function prediction servers, functional information was obtained about these hypothetical proteins. A comparison of functional information obtained from these servers revealed that some of them produced functional information, while others provided little functional information about these human adenovirus hypothetical proteins. The PFP, ESG, PSIPRED, 3d2GO, and ProtFun servers produced the most functional information regarding these hypothetical proteins. PMID:26664031

  16. A graph-theoretic approach for classification and structure prediction of transmembrane β-barrel proteins

    PubMed Central

    2012-01-01

    Background Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have been introduced for recognition and structure prediction of transmembrane β-barrel proteins. Most of these methods emphasize on homology search rather than any biological or chemical basis. Results We present a novel graph-theoretic model for classification and structure prediction of transmembrane β-barrel proteins. This model folds proteins based on energy minimization rather than a homology search, avoiding any assumption on availability of training dataset. The ab initio model presented in this paper is the first method to allow for permutations in the structure of transmembrane proteins and provides more structural information than any known algorithm. The model is also able to recognize β-barrels by assessing the pseudo free energy. We assess the structure prediction on 41 proteins gathered from existing databases on experimentally validated transmembrane β-barrel proteins. We show that our approach is quite accurate with over 90% F-score on strands and over 74% F-score on residues. The results are comparable to other algorithms suggesting that our pseudo-energy model is close to the actual physical model. We test our classification approach and show that it is able to reject α-helical bundles with 100% accuracy and β-barrel lipocalins with 97% accuracy. Conclusions We show that it is possible to design models for classification and structure prediction for transmembrane β-barrel proteins which do not depend essentially on training sets but on combinatorial properties of the structures to be proved. These models are fairly accurate, robust and can be run very efficiently on PC-like computers. Such models are useful for the genome

  17. Blind predictions of protein interfaces by docking calculations in CAPRI.

    PubMed

    Lensink, Marc F; Wodak, Shoshana J

    2010-11-15

    Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent

  18. Are current atomistic force fields accurate enough to study proteins in crowded environments?

    PubMed

    Petrov, Drazen; Zagrovic, Bojan

    2014-05-01

    The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the

  19. Are Current Atomistic Force Fields Accurate Enough to Study Proteins in Crowded Environments?

    PubMed Central

    Petrov, Drazen; Zagrovic, Bojan

    2014-01-01

    The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the

  20. Are current atomistic force fields accurate enough to study proteins in crowded environments?

    PubMed

    Petrov, Drazen; Zagrovic, Bojan

    2014-05-01

    The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the

  1. Fast and accurate resonance assignment of small-to-large proteins by combining automated and manual approaches.

    PubMed

    Niklasson, Markus; Ahlner, Alexandra; Andresen, Cecilia; Marsh, Joseph A; Lundström, Patrik

    2015-01-01

    The process of resonance assignment is fundamental to most NMR studies of protein structure and dynamics. Unfortunately, the manual assignment of residues is tedious and time-consuming, and can represent a significant bottleneck for further characterization. Furthermore, while automated approaches have been developed, they are often limited in their accuracy, particularly for larger proteins. Here, we address this by introducing the software COMPASS, which, by combining automated resonance assignment with manual intervention, is able to achieve accuracy approaching that from manual assignments at greatly accelerated speeds. Moreover, by including the option to compensate for isotope shift effects in deuterated proteins, COMPASS is far more accurate for larger proteins than existing automated methods. COMPASS is an open-source project licensed under GNU General Public License and is available for download from http://www.liu.se/forskning/foass/tidigare-foass/patrik-lundstrom/software?l=en. Source code and binaries for Linux, Mac OS X and Microsoft Windows are available.

  2. Fast and Accurate Resonance Assignment of Small-to-Large Proteins by Combining Automated and Manual Approaches

    PubMed Central

    Niklasson, Markus; Ahlner, Alexandra; Andresen, Cecilia; Marsh, Joseph A.; Lundström, Patrik

    2015-01-01

    The process of resonance assignment is fundamental to most NMR studies of protein structure and dynamics. Unfortunately, the manual assignment of residues is tedious and time-consuming, and can represent a significant bottleneck for further characterization. Furthermore, while automated approaches have been developed, they are often limited in their accuracy, particularly for larger proteins. Here, we address this by introducing the software COMPASS, which, by combining automated resonance assignment with manual intervention, is able to achieve accuracy approaching that from manual assignments at greatly accelerated speeds. Moreover, by including the option to compensate for isotope shift effects in deuterated proteins, COMPASS is far more accurate for larger proteins than existing automated methods. COMPASS is an open-source project licensed under GNU General Public License and is available for download from http://www.liu.se/forskning/foass/tidigare-foass/patrik-lundstrom/software?l=en. Source code and binaries for Linux, Mac OS X and Microsoft Windows are available. PMID:25569628

  3. Fast and accurate resonance assignment of small-to-large proteins by combining automated and manual approaches.

    PubMed

    Niklasson, Markus; Ahlner, Alexandra; Andresen, Cecilia; Marsh, Joseph A; Lundström, Patrik

    2015-01-01

    The process of resonance assignment is fundamental to most NMR studies of protein structure and dynamics. Unfortunately, the manual assignment of residues is tedious and time-consuming, and can represent a significant bottleneck for further characterization. Furthermore, while automated approaches have been developed, they are often limited in their accuracy, particularly for larger proteins. Here, we address this by introducing the software COMPASS, which, by combining automated resonance assignment with manual intervention, is able to achieve accuracy approaching that from manual assignments at greatly accelerated speeds. Moreover, by including the option to compensate for isotope shift effects in deuterated proteins, COMPASS is far more accurate for larger proteins than existing automated methods. COMPASS is an open-source project licensed under GNU General Public License and is available for download from http://www.liu.se/forskning/foass/tidigare-foass/patrik-lundstrom/software?l=en. Source code and binaries for Linux, Mac OS X and Microsoft Windows are available. PMID:25569628

  4. NOXclass: prediction of protein-protein interaction types

    PubMed Central

    Zhu, Hongbo; Domingues, Francisco S; Sommer, lngolf; Lengauer, Thomas

    2006-01-01

    Background Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Results Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. Conclusion NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at . PMID:16423290

  5. A simple yet accurate correction for winner's curse can predict signals discovered in much larger genome scans

    PubMed Central

    Bigdeli, T. Bernard; Lee, Donghyung; Webb, Bradley Todd; Riley, Brien P.; Vladimirov, Vladimir I.; Fanous, Ayman H.; Kendler, Kenneth S.; Bacanu, Silviu-Alin

    2016-01-01

    Motivation: For genetic studies, statistically significant variants explain far less trait variance than ‘sub-threshold’ association signals. To dimension follow-up studies, researchers need to accurately estimate ‘true’ effect sizes at each SNP, e.g. the true mean of odds ratios (ORs)/regression coefficients (RRs) or Z-score noncentralities. Naïve estimates of effect sizes incur winner’s curse biases, which are reduced only by laborious winner’s curse adjustments (WCAs). Given that Z-scores estimates can be theoretically translated on other scales, we propose a simple method to compute WCA for Z-scores, i.e. their true means/noncentralities. Results:WCA of Z-scores shrinks these towards zero while, on P-value scale, multiple testing adjustment (MTA) shrinks P-values toward one, which corresponds to the zero Z-score value. Thus, WCA on Z-scores scale is a proxy for MTA on P-value scale. Therefore, to estimate Z-score noncentralities for all SNPs in genome scans, we propose FDR Inverse Quantile Transformation (FIQT). It (i) performs the simpler MTA of P-values using FDR and (ii) obtains noncentralities by back-transforming MTA P-values on Z-score scale. When compared to competitors, realistic simulations suggest that FIQT is more (i) accurate and (ii) computationally efficient by orders of magnitude. Practical application of FIQT to Psychiatric Genetic Consortium schizophrenia cohort predicts a non-trivial fraction of sub-threshold signals which become significant in much larger supersamples. Conclusions: FIQT is a simple, yet accurate, WCA method for Z-scores (and ORs/RRs, via simple transformations). Availability and Implementation: A 10 lines R function implementation is available at https://github.com/bacanusa/FIQT. Contact: sabacanu@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27187203

  6. Small-scale field experiments accurately scale up to predict density dependence in reef fish populations at large scales.

    PubMed

    Steele, Mark A; Forrester, Graham E

    2005-09-20

    Field experiments provide rigorous tests of ecological hypotheses but are usually limited to small spatial scales. It is thus unclear whether these findings extrapolate to larger scales relevant to conservation and management. We show that the results of experiments detecting density-dependent mortality of reef fish on small habitat patches scale up to have similar effects on much larger entire reefs that are the size of small marine reserves and approach the scale at which some reef fisheries operate. We suggest that accurate scaling is due to the type of species interaction causing local density dependence and the fact that localized events can be aggregated to describe larger-scale interactions with minimal distortion. Careful extrapolation from small-scale experiments identifying species interactions and their effects should improve our ability to predict the outcomes of alternative management strategies for coral reef fishes and their habitats.

  7. Effects of the inlet conditions and blood models on accurate prediction of hemodynamics in the stented coronary arteries

    NASA Astrophysics Data System (ADS)

    Jiang, Yongfei; Zhang, Jun; Zhao, Wanhua

    2015-05-01

    Hemodynamics altered by stent implantation is well-known to be closely related to in-stent restenosis. Computational fluid dynamics (CFD) method has been used to investigate the hemodynamics in stented arteries in detail and help to analyze the performances of stents. In this study, blood models with Newtonian or non-Newtonian properties were numerically investigated for the hemodynamics at steady or pulsatile inlet conditions respectively employing CFD based on the finite volume method. The results showed that the blood model with non-Newtonian property decreased the area of low wall shear stress (WSS) compared with the blood model with Newtonian property and the magnitude of WSS varied with the magnitude and waveform of the inlet velocity. The study indicates that the inlet conditions and blood models are all important for accurately predicting the hemodynamics. This will be beneficial to estimate the performances of stents and also help clinicians to select the proper stents for the patients.

  8. Computational Prediction of RNA-Binding Proteins and Binding Sites.

    PubMed

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%-8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein-RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein-RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.

  9. Characterization and Prediction of Protein Flexibility Based on Structural Alphabets

    PubMed Central

    Liu, Bin

    2016-01-01

    Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility. PMID:27660756

  10. Characterization and Prediction of Protein Flexibility Based on Structural Alphabets

    PubMed Central

    Liu, Bin

    2016-01-01

    Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.

  11. Scalable prediction of compound-protein interactions using minwise hashing.

    PubMed

    Tabei, Yasuo; Yamanishi, Yoshihiro

    2013-01-01

    The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.

  12. Protein flexibility predictions using graph theory.

    PubMed

    Jacobs, D J; Rader, A J; Kuhn, L A; Thorpe, M F

    2001-08-01

    Techniques from graph theory are applied to analyze the bond networks in proteins and identify the flexible and rigid regions. The bond network consists of distance constraints defined by the covalent and hydrogen bonds and salt bridges in the protein, identified by geometric and energetic criteria. We use an algorithm that counts the degrees of freedom within this constraint network and that identifies all the rigid and flexible substructures in the protein, including overconstrained regions (with more crosslinking bonds than are needed to rigidify the region) and underconstrained or flexible regions, in which dihedral bond rotations can occur. The number of extra constraints or remaining degrees of bond-rotational freedom within a substructure quantifies its relative rigidity/flexibility and provides a flexibility index for each bond in the structure. This novel computational procedure, first used in the analysis of glassy materials, is approximately a million times faster than molecular dynamics simulations and captures the essential conformational flexibility of the protein main and side-chains from analysis of a single, static three-dimensional structure. This approach is demonstrated by comparison with experimental measures of flexibility for three proteins in which hinge and loop motion are essential for biological function: HIV protease, adenylate kinase, and dihydrofolate reductase.

  13. A novel fibrosis index comprising a non-cholesterol sterol accurately predicts HCV-related liver cirrhosis.

    PubMed

    Ydreborg, Magdalena; Lisovskaja, Vera; Lagging, Martin; Brehm Christensen, Peer; Langeland, Nina; Buhl, Mads Rauning; Pedersen, Court; Mørch, Kristine; Wejstål, Rune; Norkrans, Gunnar; Lindh, Magnus; Färkkilä, Martti; Westin, Johan

    2014-01-01

    Diagnosis of liver cirrhosis is essential in the management of chronic hepatitis C virus (HCV) infection. Liver biopsy is invasive and thus entails a risk of complications as well as a potential risk of sampling error. Therefore, non-invasive diagnostic tools are preferential. The aim of the present study was to create a model for accurate prediction of liver cirrhosis based on patient characteristics and biomarkers of liver fibrosis, including a panel of non-cholesterol sterols reflecting cholesterol synthesis and absorption and secretion. We evaluated variables with potential predictive significance for liver fibrosis in 278 patients originally included in a multicenter phase III treatment trial for chronic HCV infection. A stepwise multivariate logistic model selection was performed with liver cirrhosis, defined as Ishak fibrosis stage 5-6, as the outcome variable. A new index, referred to as Nordic Liver Index (NoLI) in the paper, was based on the model: Log-odds (predicting cirrhosis) = -12.17+ (age × 0.11) + (BMI (kg/m(2)) × 0.23) + (D7-lathosterol (μg/100 mg cholesterol)×(-0.013)) + (Platelet count (x10(9)/L) × (-0.018)) + (Prothrombin-INR × 3.69). The area under the ROC curve (AUROC) for prediction of cirrhosis was 0.91 (95% CI 0.86-0.96). The index was validated in a separate cohort of 83 patients and the AUROC for this cohort was similar (0.90; 95% CI: 0.82-0.98). In conclusion, the new index may complement other methods in diagnosing cirrhosis in patients with chronic HCV infection.

  14. Prediction of Protein Structure Using Surface Accessibility Data

    PubMed Central

    Hartlmüller, Christoph; Göbl, Christoph

    2016-01-01

    Abstract An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance‐to‐surface information encoded in the sPRE data in the chemical shift‐based CS‐Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach. PMID:27560616

  15. Prediction of Protein Structure Using Surface Accessibility Data.

    PubMed

    Hartlmüller, Christoph; Göbl, Christoph; Madl, Tobias

    2016-09-19

    An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance-to-surface information encoded in the sPRE data in the chemical shift-based CS-Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach.

  16. Prediction of Protein Structure Using Surface Accessibility Data.

    PubMed

    Hartlmüller, Christoph; Göbl, Christoph; Madl, Tobias

    2016-09-19

    An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance-to-surface information encoded in the sPRE data in the chemical shift-based CS-Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach. PMID:27560616

  17. DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Weng, Shunyan; Ma, Jianzhu; Tang, Qingming

    2015-01-01

    Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.

  18. Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction.

    PubMed

    Cao, Junzhe; Liu, Wenqi; He, Jianjun; Gu, Hong

    2013-07-01

    A new method is proposed to identify whether a query protein is singleplex or multiplex for improving the quality of protein subcellular localization prediction. Based on the transductive learning technique, this approach utilizes the information from the both query proteins and known proteins to estimate the subcellular location number of every query protein so that the singleplex and multiplex proteins can be recognized and distinguished. Each query protein is then dealt with by a targeted single-label or multi-label predictor to achieve a high-accuracy prediction result. We assess the performance of the proposed approach by applying it to three groups of protein sequences datasets. Simulation experiments show that the proposed approach can effectively identify the singleplex and multiplex proteins. Through a comparison, the reliably of this method for enhancing the power of predicting protein subcellular localization can also be verified.

  19. iStable: off-the-shelf predictor integration for predicting protein stability changes

    PubMed Central

    2013-01-01

    Background Mutation of a single amino acid residue can cause changes in a protein, which could then lead to a loss of protein function. Predicting the protein stability changes can provide several possible candidates for the novel protein designing. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users. Results We proposed an integrated predictor, iStable, with grid computing architecture constructed by using sequence information and prediction results from different element predictors. In the learning model, several machine learning methods were evaluated and adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, protein memberships in different superfamilies, and experimental conditions. Conclusions The trained and validated version of iStable provides an accurate approach for prediction of protein stability changes. iStable is freely available online at: http://predictor.nchu.edu.tw/iStable. PMID:23369171

  20. A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach.

    PubMed

    Estrada, T; Zhang, B; Cicotti, P; Armen, R S; Taufer, M

    2012-07-01

    We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking. PMID:22658682

  1. Accurate retention time determination of co-eluting proteins in analytical chromatography by means of spectral data.

    PubMed

    Dismer, Florian; Hansen, Sigrid; Oelmeier, Stefan Alexander; Hubbuch, Jürgen

    2013-03-01

    Chromatography is the method of choice for the separation of proteins, at both analytical and preparative scale. Orthogonal purification strategies for industrial use can easily be implemented by combining different modes of adsorption. Nevertheless, with flexibility comes the freedom of choice and optimal conditions for consecutive steps need to be identified in a robust and reproducible fashion. One way to address this issue is the use of mathematical models that allow for an in silico process optimization. Although this has been shown to work, model parameter estimation for complex feedstocks becomes the bottleneck in process development. An integral part of parameter assessment is the accurate measurement of retention times in a series of isocratic or gradient elution experiments. As high-resolution analytics that can differentiate between proteins are often not readily available, pure protein is mandatory for parameter determination. In this work, we present an approach that has the potential to solve this problem. Based on the uniqueness of UV absorption spectra of proteins, we were able to accurately measure retention times in systems of up to four co-eluting compounds. The presented approach is calibration-free, meaning that prior knowledge of pure component absorption spectra is not required. Actually, pure protein spectra can be determined from co-eluting proteins as part of the methodology. The approach was tested for size-exclusion chromatograms of 38 mixtures of co-eluting proteins. Retention times were determined with an average error of 0.6 s (1.6% of average peak width), approximated and measured pure component spectra showed an average coefficient of correlation of 0.992.

  2. Protein side chain conformation predictions with an MMGBSA energy function.

    PubMed

    Gaillard, Thomas; Panel, Nicolas; Simonson, Thomas

    2016-06-01

    The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.

  3. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  4. Accurate design of megadalton-scale two-component icosahedral protein complexes.

    PubMed

    Bale, Jacob B; Gonen, Shane; Liu, Yuxi; Sheffler, William; Ellis, Daniel; Thomas, Chantz; Cascio, Duilio; Yeates, Todd O; Gonen, Tamir; King, Neil P; Baker, David

    2016-07-22

    Nature provides many examples of self- and co-assembling protein-based molecular machines, including icosahedral protein cages that serve as scaffolds, enzymes, and compartments for essential biochemical reactions and icosahedral virus capsids, which encapsidate and protect viral genomes and mediate entry into host cells. Inspired by these natural materials, we report the computational design and experimental characterization of co-assembling, two-component, 120-subunit icosahedral protein nanostructures with molecular weights (1.8 to 2.8 megadaltons) and dimensions (24 to 40 nanometers in diameter) comparable to those of small viral capsids. Electron microscopy, small-angle x-ray scattering, and x-ray crystallography show that 10 designs spanning three distinct icosahedral architectures form materials closely matching the design models. In vitro assembly of icosahedral complexes from independently purified components occurs rapidly, at rates comparable to those of viral capsids, and enables controlled packaging of molecular cargo through charge complementarity. The ability to design megadalton-scale materials with atomic-level accuracy and controllable assembly opens the door to a new generation of genetically programmable protein-based molecular machines. PMID:27463675

  5. Prediction of three-dimensional transmembrane helical protein structures

    NASA Astrophysics Data System (ADS)

    Barth, Patrick

    Membrane proteins are critical to living cells and their dysfunction can lead to serious diseases. High-resolution structures of these proteins would provide very valuable information for designing eficient therapies but membrane protein crystallization is a major bottleneck. As an important alternative approach, methods for predicting membrane protein structures have been developed in recent years. This chapter focuses on the problem of modeling the structure of transmembrane helical proteins, and describes recent advancements, current limitations, and future challenges facing de novo modeling, modeling with experimental constraints, and high-resolution comparative modeling of these proteins. Abbreviations: MP, membrane protein; SP, water-soluble protein; RMSD, root-mean square deviation; Cα RMSD, root-mean square deviation over Cα atoms; TM, transmembrane; TMH, transmembrane helix; GPCR, G protein-coupled receptor; 3D, three dimensional; NMR, nuclear magnetic resonance spectroscopy; EPR, electron paramagnetic resonance spectroscopy; FTIR, Fourier transform infrared spectroscopy.

  6. Template-based prediction of protein function.

    PubMed

    Petrey, Donald; Chen, T Scott; Deng, Lei; Garzon, Jose Ignacio; Hwang, Howook; Lasso, Gorka; Lee, Hunjoong; Silkov, Antonina; Honig, Barry

    2015-06-01

    We discuss recent approaches for structure-based protein function annotation. We focus on template-based methods where the function of a query protein is deduced from that of a template for which both the structure and function are known. We describe the different ways of identifying a template. These are typically based on sequence analysis but new methods based on purely structural similarity are also being developed that allow function annotation based on structural relationships that cannot be recognized by sequence. The growing number of available structures of known function, improved homology modeling techniques and new developments in the use of structure allow template-based methods to be applied on a proteome-wide scale and in many different biological contexts. This progress significantly expands the range of applicability of structural information in function annotation to a level that previously was only achievable by sequence comparison.

  7. Accurate electrical prediction of memory array through SEM-based edge-contour extraction using SPICE simulation

    NASA Astrophysics Data System (ADS)

    Shauly, Eitan; Rotstein, Israel; Peltinov, Ram; Latinski, Sergei; Adan, Ofer; Levi, Shimon; Menadeva, Ovadya

    2009-03-01

    The continues transistors scaling efforts, for smaller devices, similar (or larger) drive current/um and faster devices, increase the challenge to predict and to control the transistor off-state current. Typically, electrical simulators like SPICE, are using the design intent (as-drawn GDS data). At more sophisticated cases, the simulators are fed with the pattern after lithography and etch process simulations. As the importance of electrical simulation accuracy is increasing and leakage is becoming more dominant, there is a need to feed these simulators, with more accurate information extracted from physical on-silicon transistors. Our methodology to predict changes in device performances due to systematic lithography and etch effects was used in this paper. In general, the methodology consists on using the OPCCmaxTM for systematic Edge-Contour-Extraction (ECE) from transistors, taking along the manufacturing and includes any image distortions like line-end shortening, corner rounding and line-edge roughness. These measurements are used for SPICE modeling. Possible application of this new metrology is to provide a-head of time, physical and electrical statistical data improving time to market. In this work, we applied our methodology to analyze a small and large array's of 2.14um2 6T-SRAM, manufactured using Tower Standard Logic for General Purposes Platform. 4 out of the 6 transistors used "U-Shape AA", known to have higher variability. The predicted electrical performances of the transistors drive current and leakage current, in terms of nominal values and variability are presented. We also used the methodology to analyze an entire SRAM Block array. Study of an isolation leakage and variability are presented.

  8. Predicting bovine milk protein composition based on Fourier transform infrared spectra.

    PubMed

    Rutten, M J M; Bovenhuis, H; Heck, J M L; van Arendonk, J A M

    2011-11-01

    Phenotypic information on individual protein composition of cows is important for many aspects of dairy processing with cheese production as the center of gravity. However, measuring individual protein composition is expensive and time consuming. In this study, we investigated whether protein composition can be predicted based on inexpensive and routinely measured milk Fourier transform infrared (FTIR) spectra. Based on 900 calibration and 900 validation samples that had both capillary zone electrophoresis (CZE)-determined protein composition and FTIR spectra available, low to moderate validation R(2) were reached (from 0.18 for α(S1)-casein to 0.56 for β-lactoglobulin). The potential usefulness of this model on the phenotypic level was investigated by means of achieved selection differentials for 25% of the best animals. For α-lactalbumin (R(2)=0.20), the selection differential amounted to 0.18 g/100g and for casein index (R(2)=0.50) to 1.24 g/100g. We concluded that predictions of protein composition were not accurate enough to enable selection of individual animals. However, for specific purposes when, for example, groups of animals that meet a certain threshold are to be selected, the presented model could be useful in practice on the phenotypic level. The potential usefulness of this model on the genetic level was investigated by means of genetic correlations between CZE-determined and FTIR-predicted protein composition traits. The genetic correlations ranged from 0.62 (β-casein) to 0.97 (whey). Thus, predictions of protein composition, when used as input to estimate breeding values, provide an excellent means for genetic improvement of protein composition. In addition, estimated repeatabilities based on 3 repeated observations of predicted protein composition showed that a considerable amount of prediction error can be removed using repeated observations.

  9. Protein Structure and Function Prediction Using I-TASSER.

    PubMed

    Yang, Jianyi; Zhang, Yang

    2015-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.

  10. Effect of Using Suboptimal Alignments in Template-Based Protein Structure Prediction

    PubMed Central

    Chen, Hao; Kihara, Daisuke

    2010-01-01

    Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing due to the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of employing suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we employ suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach which only uses the optimal alignment in defining residue contacts and also the reranking strategy, which uses the contact potential in reranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperform existing methods. PMID:21058297

  11. Measurements of accurate x-ray scattering data of protein solutions using small stationary sample cells

    SciTech Connect

    Hong Xinguo; Hao Quan

    2009-01-15

    In this paper, we report a method of precise in situ x-ray scattering measurements on protein solutions using small stationary sample cells. Although reduction in the radiation damage induced by intense synchrotron radiation sources is indispensable for the correct interpretation of scattering data, there is still a lack of effective methods to overcome radiation-induced aggregation and extract scattering profiles free from chemical or structural damage. It is found that radiation-induced aggregation mainly begins on the surface of the sample cell and grows along the beam path; the diameter of the damaged region is comparable to the x-ray beam size. Radiation-induced aggregation can be effectively avoided by using a two-dimensional scan (2D mode), with an interval as small as 1.5 times the beam size, at low temperature (e.g., 4 deg. C). A radiation sensitive protein, bovine hemoglobin, was used to test the method. A standard deviation of less than 5% in the small angle region was observed from a series of nine spectra recorded in 2D mode, in contrast to the intensity variation seen using the conventional stationary technique, which can exceed 100%. Wide-angle x-ray scattering data were collected at a standard macromolecular diffraction station using the same data collection protocol and showed a good signal/noise ratio (better than the reported data on the same protein using a flow cell). The results indicate that this method is an effective approach for obtaining precise measurements of protein solution scattering.

  12. Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models.

    PubMed

    Suárez, Ernesto; Adelman, Joshua L; Zuckerman, Daniel M

    2016-08-01

    Because standard molecular dynamics (MD) simulations are unable to access time scales of interest in complex biomolecular systems, it is common to "stitch together" information from multiple shorter trajectories using approximate Markov state model (MSM) analysis. However, MSMs may require significant tuning and can yield biased results. Here, by analyzing some of the longest protein MD data sets available (>100 μs per protein), we show that estimators constructed based on exact non-Markovian (NM) principles can yield significantly improved mean first-passage times (MFPTs) for protein folding and unfolding. In some cases, MSM bias of more than an order of magnitude can be corrected when identical trajectory data are reanalyzed by non-Markovian approaches. The NM analysis includes "history" information, higher order time correlations compared to MSMs, that is available in every MD trajectory. The NM strategy is insensitive to fine details of the states used and works well when a fine time-discretization (i.e., small "lag time") is used. PMID:27340835

  13. Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases

    SciTech Connect

    Nigsch, Florian; Mitchell, John B.O.

    2008-09-01

    The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of 'toxiclogical' profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we built a computational multiclass model using the Winnow algorithm based on a dataset of protein targets derived from the MDL Drug Data Report. A 15-fold Monte Carlo cross-validation using 50% of each class for training, and the remaining 50% for testing, provided an assessment of the accuracy of that model. We retained the 3 top-ranking predictions and found that in 82% of all cases the correct target was predicted within these three predictions. The first prediction was the correct one in almost 70% of cases. A model built on the whole protein target dataset was then used to predict the protein targets for 150 000 molecules from the MDL Toxicity Database. We analysed the frequency of the predictions across the panel of protein targets for experimentally determined toxicity classes of all molecules. This allowed us to identify clusters of proteins related by their toxicological profiles, as well as toxicities that are related. Literature-based evidence is provided for some specific clusters to show the relevance of the relationships identified.

  14. A Prediction Model for Membrane Proteins Using Moments Based Features.

    PubMed

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies.

  15. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  16. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

    PubMed

    Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

    2016-05-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To

  17. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

    PubMed Central

    Brezovský, Jan

    2016-01-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations

  18. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance.

    PubMed

    Majaj, Najib J; Hong, Ha; Solomon, Ethan A; DiCarlo, James J

    2015-09-30

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior. PMID:26424887

  19. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance

    PubMed Central

    Hong, Ha; Solomon, Ethan A.; DiCarlo, James J.

    2015-01-01

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior. PMID:26424887

  20. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance.

    PubMed

    Majaj, Najib J; Hong, Ha; Solomon, Ethan A; DiCarlo, James J

    2015-09-30

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior.

  1. Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information

    PubMed Central

    Mishra, Nitish K.; Chang, Junil; Zhao, Patrick X.

    2014-01-01

    accurate predictions for the substrate specificity of membrane transport proteins. TrSSP: The Transporter Substrate Specificity Prediction Server, a web server that implements the SVM models developed in this paper, is freely available at http://bioinfo.noble.org/TrSSP. PMID:24968309

  2. The NIP7 protein is required for accurate pre-rRNA processing in human cells.

    PubMed

    Morello, Luis G; Hesling, Cédric; Coltri, Patrícia P; Castilho, Beatriz A; Rimokh, Ruth; Zanchin, Nilson I T

    2011-01-01

    Eukaryotic ribosome biogenesis requires the function of a large number of trans-acting factors which interact transiently with the nascent pre-rRNA and dissociate as the ribosomal subunits proceed to maturation and export to the cytoplasm. Loss-of-function mutations in human trans-acting factors or ribosome components may lead to genetic syndromes. In a previous study, we have shown association between the SBDS (Shwachman-Bodian-Diamond syndrome) and NIP7 proteins and that downregulation of SBDS in HEK293 affects gene expression at the transcriptional and translational levels. In this study, we show that downregulation of NIP7 affects pre-rRNA processing, causing an imbalance of the 40S/60S subunit ratio. We also identified defects at the pre-rRNA processing level with a decrease of the 34S pre-rRNA concentration and an increase of the 26S and 21S pre-rRNA concentrations, indicating that processing at site 2 is particularly slower in NIP7-depleted cells and showing that NIP7 is required for maturation of the 18S rRNA. The NIP7 protein is restricted to the nuclear compartment and co-sediments with complexes with molecular masses in the range of 40S-80S, suggesting an association to nucleolar pre-ribosomal particles. Downregulation of NIP7 affects cell proliferation, consistently with an important role for NIP7 in rRNA biosynthesis in human cells.

  3. Protein function prediction using guilty by association from interaction networks.

    PubMed

    Piovesan, Damiano; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E

    2015-12-01

    Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein-protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein-protein interaction networks is also provided.

  4. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu

    2016-10-01

    Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future

  5. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu

    2016-10-01

    Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future

  6. Computational finite element bone mechanics accurately predicts mechanical competence in the human radius of an elderly population.

    PubMed

    Mueller, Thomas L; Christen, David; Sandercott, Steve; Boyd, Steven K; van Rietbergen, Bert; Eckstein, Felix; Lochmüller, Eva-Maria; Müller, Ralph; van Lenthe, G Harry

    2011-06-01

    High-resolution peripheral quantitative computed tomography (HR-pQCT) is clinically available today and provides a non-invasive measure of 3D bone geometry and micro-architecture with unprecedented detail. In combination with microarchitectural finite element (μFE) models it can be used to determine bone strength using a strain-based failure criterion. Yet, images from only a relatively small part of the radius are acquired and it is not known whether the region recommended for clinical measurements does predict forearm fracture load best. Furthermore, it is questionable whether the currently used failure criterion is optimal because of improvements in image resolution, changes in the clinically measured volume of interest, and because the failure criterion depends on the amount of bone present. Hence, we hypothesized that bone strength estimates would improve by measuring a region closer to the subchondral plate, and by defining a failure criterion that would be independent of the measured volume of interest. To answer our hypotheses, 20% of the distal forearm length from 100 cadaveric but intact human forearms was measured using HR-pQCT. μFE bone strength was analyzed for different subvolumes, as well as for the entire 20% of the distal radius length. Specifically, failure criteria were developed that provided accurate estimates of bone strength as assessed experimentally. It was shown that distal volumes were better in predicting bone strength than more proximal ones. Clinically speaking, this would argue to move the volume of interest for the HR-pQCT measurements even more distally than currently recommended by the manufacturer. Furthermore, new parameter settings using the strain-based failure criterion are presented providing better accuracy for bone strength estimates.

  7. Multi-reference-based multiple alignment statistics enables accurate protein-particle pickup from noisy images.

    PubMed

    Kawata, Masaaki; Sato, Chikara

    2013-04-01

    Data mining from noisy data/images is one of the most important themes in modern science and technology. Statistical image processing is a promising technique for analysing such data. Automation of particle pickup from noisy electron micrographs is essential, especially when improvement of the resolution of single particle analysis requires a huge number of particle images. For such a purpose, reference-based matching using primary three-dimensional (3D) model projections is mainly adopted. In the matching, however, the highest peaks of the correlation may not accurately indicate particles when the image is very noisy. In contrast, the density and the heights of the peaks should reflect the probability distribution of the particles. To statistically determine the particle positions from the peak distributions, we have developed a density-based peak search followed by a peak selection based on average peak height, using multi-reference alignment (MRA). Its extension, using multi-reference multiple alignment (MRMA), was found to enable particle pickup at higher accuracy even from extremely noisy images with a signal-to-noise ratio of 0.001. We refer to these new methods as stochastic pickup with MRA (MRA-StoPICK) or with MRMA (MRMA-StoPICK). MRMA-StoPICK has a higher pickup accuracy and furthermore, is almost independent of parameter settings. They were successfully applied to cryo-electron micrographs of Rice dwarf virus. Because current computational resources and parallel data processing environments allow somewhat CPU-intensive MRA-StoPICK and MRMA-StoPICK to be performed in a short period, these methods are expected to allow high-resolution analysis of the 3D structure of particles.

  8. A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins

    PubMed Central

    Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

    2013-01-01

    The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595

  9. Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems

    SciTech Connect

    Samudrala, Ram; Heffron, Fred; McDermott, Jason E.

    2009-04-24

    The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates, effector proteins, are not. We have used a machine learning approach to identify new secreted effectors. The method integrates evolutionary measures, such as the pattern of homologs in a range of other organisms, and sequence-based features, such as G+C content, amino acid composition and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from Salmonella typhimurium and validated on a corresponding set of effectors from Pseudomonas syringae, after eliminating effectors with detectable sequence similarity. The method was able to identify all of the known effectors in P. syringae with a specificity of 84% and sensitivity of 82%. The reciprocal validation, training on P. syringae and validating on S. typhimurium, gave similar results with a specificity of 86% when the sensitivity level was 87%. These results show that type III effectors in disparate organisms share common features. We found that maximal performance is attained by including an N-terminal sequence of only 30 residues, which agrees with previous studies indicating that this region contains the secretion signal. We then used the method to define the most important residues in this putative secretion signal. Finally, we present novel predictions of secreted effectors in S. typhimurium, some of which have been experimentally validated, and apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis. This approach is a novel and effective way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.

  10. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  11. GSAFold: a new application of GSA to protein structure prediction.

    PubMed

    Melo, Marcelo C R; Bernardi, Rafael C; Fernandes, Tácio V A; Pascutti, Pedro G

    2012-08-01

    The folding process defines three-dimensional protein structures from their amino acid chains. A protein's structure determines its activity and properties; thus knowing such conformation on an atomic level is essential for both basic and applied studies of protein function and dynamics. However, the acquisition of such structures by experimental methods is slow and expensive, and current computational methods mostly depend on previously known structures to determine new ones. Here we present a new software called GSAFold that applies the generalized simulated annealing (GSA) algorithm on ab initio protein structure prediction. The GSA is a stochastic search algorithm employed in energy minimization and used in global optimization problems, especially those that depend on long-range interactions, such as gravity models and conformation optimization of small molecules. This new implementation applies, for the first time in ab initio protein structure prediction, an analytical inverse for the Visitation function of GSA. It also employs the broadly used NAMD Molecular Dynamics package to carry out energy calculations, allowing the user to select different force fields and parameterizations. Moreover, the software also allows the execution of several simulations simultaneously. Applications that depend on protein structures include rational drug design and structure-based protein function prediction. Applying GSAFold in a test peptide, it was possible to predict the structure of mastoparan-X to a root mean square deviation of 3.00 Å. PMID:22622959

  12. A Support Vector Machine model for the prediction of proteotypic peptides for accurate mass and time proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Cannon, William R.; Oehmen, Christopher S.; Shah, Anuj R.; Gurumoorthi, Vidhya; Lipton, Mary S.; Waters, Katrina M.

    2008-07-01

    Motivation: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares these profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). Results: We present a Support Vector Machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity, and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of ~0.8 with a standard deviation of less than 0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. Availability: http://omics.pnl.gov/software/STEPP.php

  13. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges.

    PubMed

    Sonah, Humira; Deshmukh, Rupesh K; Bélanger, Richard R

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  14. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    PubMed Central

    Sonah, Humira; Deshmukh, Rupesh K.; Bélanger, Richard R.

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant–pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  15. Predicting Protein-Protein Interactions from the Molecular to the Proteome Level.

    PubMed

    Keskin, Ozlem; Tuncbag, Nurcan; Gursoy, Attila

    2016-04-27

    Identification of protein-protein interactions (PPIs) is at the center of molecular biology considering the unquestionable role of proteins in cells. Combinatorial interactions result in a repertoire of multiple functions; hence, knowledge of PPI and binding regions naturally serve to functional proteomics and drug discovery. Given experimental limitations to find all interactions in a proteome, computational prediction/modeling of protein interactions is a prerequisite to proceed on the way to complete interactions at the proteome level. This review aims to provide a background on PPIs and their types. Computational methods for PPI predictions can use a variety of biological data including sequence-, evolution-, expression-, and structure-based data. Physical and statistical modeling are commonly used to integrate these data and infer PPI predictions. We review and list the state-of-the-art methods, servers, databases, and tools for protein-protein interaction prediction. PMID:27074302

  16. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  17. A time accurate prediction of the viscous flow in a turbine stage including a rotor in motion

    NASA Astrophysics Data System (ADS)

    Shavalikul, Akamol

    accurate flow characteristics in the NGV domain and the rotor domain with less computational time and computer memory requirements. In contrast, the time accurate flow simulation can predict all unsteady flow characteristics occurring in the turbine stage, but with high computational resource requirements. (Abstract shortened by UMI.)

  18. Efficient Prediction of Co-Complexed Proteins Based on Coevolution

    PubMed Central

    de Vienne, Damien M.; Azé, Jérôme

    2012-01-01

    The prediction of the network of protein-protein interactions (PPI) of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure. We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method. A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli. PMID:23152796

  19. Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

    PubMed Central

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip

    2015-01-01

    Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  20. DPROT: prediction of disordered proteins using evolutionary information.

    PubMed

    Sethi, Deepti; Garg, Aarti; Raghava, G P S

    2008-10-01

    The association of structurally disordered proteins with a number of diseases has engendered enormous interest and therefore demands a prediction method that would facilitate their expeditious study at molecular level. The present study describes the development of a computational method for predicting disordered proteins using sequence and profile compositions as input features for the training of SVM models. First, we developed the amino acid and dipeptide compositions based SVM modules which yielded sensitivities of 75.6 and 73.2% along with Matthew's Correlation Coefficient (MCC) values of 0.75 and 0.60, respectively. In addition, the use of predicted secondary structure content (coil, sheet and helices) in the form of composition values attained a sensitivity of 76.8% and MCC value of 0.77. Finally, the training of SVM models using evolutionary information hidden in the multiple sequence alignment profile improved the prediction performance by achieving a sensitivity value of 78% and MCC of 0.78. Furthermore, when evaluated on an independent dataset of partially disordered proteins, the same SVM module provided a correct prediction rate of 86.6%. Based on the above study, a web server ("DPROT") was developed for the prediction of disordered proteins, which is available at http://www.imtech.res.in/raghava/dprot/.

  1. Prediction of disease-related mutations affecting protein localization

    PubMed Central

    Laurila, Kirsti; Vihinen, Mauno

    2009-01-01

    Background Eukaryotic cells contain numerous compartments, which have different protein constituents. Proteins are typically directed to compartments by short peptide sequences that act as targeting signals. Translocation to the proper compartment allows a protein to form the necessary interactions with its partners and take part in biological networks such as signalling and metabolic pathways. If a protein is not transported to the correct intracellular compartment either the reaction performed or information carried by the protein does not reach the proper site, causing either inactivation of central reactions or misregulation of signalling cascades, or the mislocalized active protein has harmful effects by acting in the wrong place. Results Numerous methods have been developed to predict protein subcellular localization with quite high accuracy. We applied bioinformatics methods to investigate the effects of known disease-related mutations on protein targeting and localization by analyzing over 22,000 missense mutations in more than 1,500 proteins with two complementary prediction approaches. Several hundred putative localization affecting mutations were identified and investigated statistically. Conclusion Although alterations to localization signals are rare, these effects should be taken into account when analyzing the consequences of disease-related mutations. PMID:19309509

  2. A Systematic Review of Predictions of Survival in Palliative Care: How Accurate Are Clinicians and Who Are the Experts?

    PubMed Central

    Harris, Adam; Harries, Priscilla

    2016-01-01

    overall accuracy being reported. Data were extracted using a standardised tool, by one reviewer, which could have introduced bias. Devising search terms for prognostic studies is challenging. Every attempt was made to devise search terms that were sufficiently sensitive to detect all prognostic studies; however, it remains possible that some studies were not identified. Conclusion Studies of prognostic accuracy in palliative care are heterogeneous, but the evidence suggests that clinicians’ predictions are frequently inaccurate. No sub-group of clinicians was consistently shown to be more accurate than any other. Implications of Key Findings Further research is needed to understand how clinical predictions are formulated and how their accuracy can be improved. PMID:27560380

  3. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.« less

  4. SAM-T08, HMM-based protein structure prediction

    PubMed Central

    Karplus, Kevin

    2009-01-01

    The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PMID:19483096

  5. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    SciTech Connect

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.; Gonzalez, Rachel M.; Varnum, Susan M.; Zangar, Richard C.

    2008-07-14

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensity that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting

  6. Plasma proteins predict conversion to dementia from prodromal disease

    PubMed Central

    Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L.; Ashton, Nicholas J.; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon

    2014-01-01

    Background The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Methods Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Results Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). Conclusions We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. PMID:25012867

  7. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

    PubMed

    Venner, Eric; Lisewski, Andreas Martin; Erdin, Serkan; Ward, R Matthew; Amin, Shivas R; Lichtarge, Olivier

    2010-01-01

    High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.

  8. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures

    PubMed Central

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/. PMID:26982818

  9. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures.

    PubMed

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/.

  10. Addressing the Role of Conformational Diversity in Protein Structure Prediction.

    PubMed

    Palopoli, Nicolas; Monzon, Alexander Miguel; Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  11. Addressing the Role of Conformational Diversity in Protein Structure Prediction

    PubMed Central

    Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  12. Stability Curve Prediction of Homologous Proteins Using Temperature-Dependent Statistical Potentials

    PubMed Central

    Pucci, Fabrizio; Rooman, Marianne

    2014-01-01

    The unraveling and control of protein stability at different temperatures is a fundamental problem in biophysics that is substantially far from being quantitatively and accurately solved, as it requires a precise knowledge of the temperature dependence of amino acid interactions. In this paper we attempt to gain insight into the thermal stability of proteins by designing a tool to predict the full stability curve as a function of the temperature for a set of 45 proteins belonging to 11 homologous families, given their sequence and structure, as well as the melting temperature () and the change in heat capacity () of proteins belonging to the same family. Stability curves constitute a fundamental instrument to analyze in detail the thermal stability and its relation to the thermodynamic stability, and to estimate the enthalpic and entropic contributions to the folding free energy. In summary, our approach for predicting the protein stability curves relies on temperature-dependent statistical potentials derived from three datasets of protein structures with targeted thermal stability properties. Using these potentials, the folding free energies () at three different temperatures were computed for each protein. The Gibbs-Helmholtz equation was then used to predict the protein's stability curve as the curve that best fits these three points. The results are quite encouraging: the standard deviations between the experimental and predicted 's, 's and folding free energies at room temperature () are equal to 13 , 1.3 ) and 4.1 , respectively, in cross-validation. The main sources of error and some further improvements and perspectives are briefly discussed. PMID:25032839

  13. Mimicking the folding pathway to improve homology-free protein structure prediction

    NASA Astrophysics Data System (ADS)

    Freed, Karl; Debartolo, Joe; Colubri, Andres; Jha, Abhishek; Fitzgerald, James; Sosnick, Tobin

    2010-03-01

    Since demonstrating that a protein's sequence encodes its structure, the prediction of structure from sequence remains an outstanding problem that impacts numerous scientific disciplines including many genome projects. By iteratively fixing secondary structure assignments of residues during Monte Carlo simulations of folding, our coarse grained model without information concerning homology or explicit side chains outperforms current homology-based secondary structure prediction methods for many proteins. The computationally rapid algorithm using only single residue (phi, psi) dihedral angle moves also generates tertiary structures of comparable accuracy to existing all-atom methods for many small proteins, particularly ones with low homology. Hence, given appropriate search strategies and scoring functions, reduced representations can be used for accurately predicting secondary structure as well as providing three-dimensional structures, thereby increasing the size of proteins approachable by homology-free methods and the accuracy of template methods whose accuracy depends on the quality of the input secondary structure. Inclusion of information from evolutionarily related sequences enhances the statistics and the accuracy of the predictions.

  14. Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction

    PubMed Central

    Park, Hyeon Ah; Kim, Taewook; Li, Meijing; Shon, Ho Sun; Park, Jeong Seok; Ryu, Keun Ho

    2015-01-01

    Objectives Predicting protein function from the protein–protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network. Methods In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence—including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps. Results The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach. Conclusion The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain. PMID:25938021

  15. [Study of decision tree in the application of predicting protein-protein interactions].

    PubMed

    Guo, Xiaolong; Jiang, Yan; Qui, Lu

    2013-10-01

    Proteins are the final executive actor of cell viability and function. Protein-protein interactions determine the complexity of the organism. Research on the protein interactions can help us understand the function of the protein at the molecular level, learn the cell growth, development, differentiation, apoptosis and understand biological regulation mechanisms and other activities. They are essential for understanding the pathologies of diseases and helpful in the prevention and treatment of diseases, as well as in the development of new drugs. In this paper, we employ the single decision-tree classification model to predict protein-protein interactions in the yeast. The original data came from the existing literature. Using software Clementine, this paper analyzes how these attributes affect the accuracy of the model by adjusting the predicted attributes. The result shows that a single decision tree is a good classification model and it has higher accuracy compared to those in the previous researches.

  16. CoinFold: a web server for protein contact prediction and contact-assisted protein folding

    PubMed Central

    Wang, Sheng; Li, Wei; Zhang, Renyu; Liu, Shiwang; Xu, Jinbo

    2016-01-01

    CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction. PMID:27112569

  17. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  18. Prediction of protein complexes using empirical free energy functions.

    PubMed Central

    Weng, Z.; Vajda, S.; Delisi, C.

    1996-01-01

    A long sought goal in the physical chemistry of macromolecular structure, and one directly relevant to understanding the molecular basis of biological recognition, is predicting the geometry of bimolecular complexes from the geometries of their free monomers. Even when the monomers remain relatively unchanged by complex formation, prediction has been difficult because the free energies of alternative conformations of the complex have been difficult to evaluate quickly and accurately. This has forced the use of incomplete target functions, which typically do no better than to provide tens of possible complexes with no way of choosing between them. Here we present a general framework for empirical free energy evaluation and report calculations, based on a relatively complete and easily executable free energy function, that indicate that the structures of complexes can be predicted accurately from the structures of monomers, including close sequence homologues. The calculations also suggest that the binding free energies themselves may be predicted with reasonable accuracy. The method is compared to an alternative formulation that has also been applied recently to the same data set. Both approaches promise to open new opportunities in macromolecular design and specificity modification. PMID:8845751

  19. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    PubMed

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.

  20. Protein structure prediction using residue- and fragment-environment potentials in CASP11.

    PubMed

    Kim, Hyungrae; Kihara, Daisuke

    2016-09-01

    An accurate scoring function that can select near-native structure models from a pool of alternative models is key for successful protein structure prediction. For the critical assessment of techniques for protein structure prediction (CASP) 11, we have built a protocol of protein structure prediction that has novel coarse-grained scoring functions for selecting decoys as the heart of its pipeline. The score named PRESCO (Protein Residue Environment SCOre) developed recently by our group evaluates the native-likeness of local structural environment of residues in a structure decoy considering positions and the depth of side-chains of spatially neighboring residues. We also introduced a helix interaction potential as an additional scoring function for selecting decoys. The best models selected by PRESCO and the helix interaction potential underwent structure refinement, which includes side-chain modeling and relaxation with a short molecular dynamics simulation. Our protocol was successful, achieving the top rank in the free modeling category with a significant margin of the accumulated Z-score to the subsequent groups when the top 1 models were considered. Proteins 2016; 84(Suppl 1):105-117. © 2015 Wiley Periodicals, Inc.

  1. Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine.

    PubMed

    Zhou, X-B; Chen, C; Li, Z-C; Zou, X-Y

    2008-08-01

    Apoptosis proteins play an important role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is very helpful for understanding the mechanism of apoptosis and their biological functions. However, most of the existing predictive methods are designed by utilizing a single classifier, which would limit the further improvement of their performances. In this paper, a novel predictive method, which is essentially a multi-classifier system, has been proposed by combing a dual-layer support vector machine (SVM) with multiple compositions including amino acid composition (AAC), dipeptide composition (DPC) and amphiphilic pseudo amino acid composition (Am-Pse-AAC). As a demonstration, the predictive performance of our method was evaluated on two datasets of apoptosis proteins, involving the standard dataset ZD98 generated by Zhou and Doctor, and a larger dataset ZW225 generated by Zhang et al. With the jackknife test, the overall accuracies of our method on the two datasets reach 94.90% and 88.44%, respectively. The promising results indicate that our method can be a complementary tool for the prediction of subcellular location.

  2. Toward Relatively General and Accurate Quantum Chemical Predictions of Solid-State 17O NMR Chemical Shifts in Various Biologically Relevant Oxygen-containing Compounds

    PubMed Central

    Rorick, Amber; Michael, Matthew A.; Yang, Liu; Zhang, Yong

    2015-01-01

    Oxygen is an important element in most biologically significant molecules and experimental solid-state 17O NMR studies have provided numerous useful structural probes to study these systems. However, computational predictions of solid-state 17O NMR chemical shift tensor properties are still challenging in many cases and in particular each of the prior computational work is basically limited to one type of oxygen-containing systems. This work provides the first systematic study of the effects of geometry refinement, method and basis sets for metal and non-metal elements in both geometry optimization and NMR property calculations of some biologically relevant oxygen-containing compounds with a good variety of XO bonding groups, X= H, C, N, P, and metal. The experimental range studied is of 1455 ppm, a major part of the reported 17O NMR chemical shifts in organic and organometallic compounds. A number of computational factors towards relatively general and accurate predictions of 17O NMR chemical shifts were studied to provide helpful and detailed suggestions for future work. For the studied various kinds of oxygen-containing compounds, the best computational approach results in a theory-versus-experiment correlation coefficient R2 of 0.9880 and mean absolute deviation of 13 ppm (1.9% of the experimental range) for isotropic NMR shifts and R2 of 0.9926 for all shift tensor properties. These results shall facilitate future computational studies of 17O NMR chemical shifts in many biologically relevant systems, and the high accuracy may also help refinement and determination of active-site structures of some oxygen-containing substrate bound proteins. PMID:26274812

  3. Toward Relatively General and Accurate Quantum Chemical Predictions of Solid-State (17)O NMR Chemical Shifts in Various Biologically Relevant Oxygen-Containing Compounds.

    PubMed

    Rorick, Amber; Michael, Matthew A; Yang, Liu; Zhang, Yong

    2015-09-01

    Oxygen is an important element in most biologically significant molecules, and experimental solid-state (17)O NMR studies have provided numerous useful structural probes to study these systems. However, computational predictions of solid-state (17)O NMR chemical shift tensor properties are still challenging in many cases, and in particular, each of the prior computational works is basically limited to one type of oxygen-containing system. This work provides the first systematic study of the effects of geometry refinement, method, and basis sets for metal and nonmetal elements in both geometry optimization and NMR property calculations of some biologically relevant oxygen-containing compounds with a good variety of XO bonding groups (X = H, C, N, P, and metal). The experimental range studied is of 1455 ppm, a major part of the reported (17)O NMR chemical shifts in organic and organometallic compounds. A number of computational factors toward relatively general and accurate predictions of (17)O NMR chemical shifts were studied to provide helpful and detailed suggestions for future work. For the studied kinds of oxygen-containing compounds, the best computational approach results in a theory-versus-experiment correlation coefficient (R(2)) value of 0.9880 and a mean absolute deviation of 13 ppm (1.9% of the experimental range) for isotropic NMR shifts and an R(2) value of 0.9926 for all shift-tensor properties. These results shall facilitate future computational studies of (17)O NMR chemical shifts in many biologically relevant systems, and the high accuracy may also help the refinement and determination of active-site structures of some oxygen-containing substrate-bound proteins.

  4. Protein structure prediction enhanced with evolutionary diversity : SPEED.

    SciTech Connect

    DeBartolo, J.; Hocky, G.; Wilde, M.; Xu, J.; Freed, K. F.; Sosnick, T. R.; Univ. of Chicago; Toyota Technological Inst. at Chicago

    2010-03-01

    For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of {phi},{psi} backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.

  5. Stringent DDI-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions

    PubMed Central

    2013-01-01

    discovered some important properties of domains involved in host-pathogen PPIs. We find that both host and pathogen proteins involved in host-pathogen PPIs tend to have more domains than proteins involved in intra-species PPIs, and these domains have more interaction partners than domains on proteins involved in intra-species PPI. Conclusions The stringent DDI-based prediction approach reported in this work provides a stringent strategy for predicting host-pathogen PPIs. It also performs better than a conventional DDI-based approach in predicting PPIs. We have predicted a small set of accurate H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. PMID:24564941

  6. A computational method to predict carbonylation sites in yeast proteins.

    PubMed

    Lv, H Q; Liu, J; Han, J Q; Zheng, J G; Liu, R L

    2016-01-01

    Several post-translational modifications (PTM) have been discussed in literature. Among a variety of oxidative stress-induced PTM, protein carbonylation is considered a biomarker of oxidative stress. Only certain proteins can be carbonylated because only four amino acid residues, namely lysine (K), arginine (R), threonine (T) and proline (P), are susceptible to carbonylation. The yeast proteome is an excellent model to explore oxidative stress, especially protein carbonylation. Current experimental approaches in identifying carbonylation sites are expensive, time-consuming and limited in their abilities to process proteins. Furthermore, there is no bioinformational method to predict carbonylation sites in yeast proteins. Therefore, we propose a computational method to predict yeast carbonylation sites. This method has total accuracies of 86.32, 85.89, 84.80, and 86.80% in predicting the carbonylation sites of K, R, T, and P, respectively. These results were confirmed by 10-fold cross-validation. The ability to identify carbonylation sites in different kinds of features was analyzed and the position-specific composition of the modification site-flanking residues was discussed. Additionally, a software tool has been developed to help with the calculations in this method. Datasets and the software are available at https://sourceforge.net/projects/hqlstudio/ files/CarSpred.Y/. PMID:27420944

  7. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  8. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  9. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins. PMID:25069136

  10. Infectious titres of sheep scrapie and bovine spongiform encephalopathy agents cannot be accurately predicted from quantitative laboratory test results.

    PubMed

    González, Lorenzo; Thorne, Leigh; Jeffrey, Martin; Martin, Stuart; Spiropoulos, John; Beck, Katy E; Lockey, Richard W; Vickery, Christopher M; Holder, Thomas; Terry, Linda

    2012-11-01

    It is widely accepted that abnormal forms of the prion protein (PrP) are the best surrogate marker for the infectious agent of prion diseases and, in practice, the detection of such disease-associated (PrP(d)) and/or protease-resistant (PrP(res)) forms of PrP is the cornerstone of diagnosis and surveillance of the transmissible spongiform encephalopathies (TSEs). Nevertheless, some studies question the consistent association between infectivity and abnormal PrP detection. To address this discrepancy, 11 brain samples of sheep affected with natural scrapie or experimental bovine spongiform encephalopathy were selected on the basis of the magnitude and predominant types of PrP(d) accumulation, as shown by immunohistochemical (IHC) examination; contra-lateral hemi-brain samples were inoculated at three different dilutions into transgenic mice overexpressing ovine PrP and were also subjected to quantitative analysis by three biochemical tests (BCTs). Six samples gave 'low' infectious titres (10⁶·⁵ to 10⁶·⁷ LD₅₀ g⁻¹) and five gave 'high titres' (10⁸·¹ to ≥ 10⁸·⁷ LD₅₀ g⁻¹) and, with the exception of the Western blot analysis, those two groups tended to correspond with samples with lower PrP(d)/PrP(res) results by IHC/BCTs. However, no statistical association could be confirmed due to high individual sample variability. It is concluded that although detection of abnormal forms of PrP by laboratory methods remains useful to confirm TSE infection, infectivity titres cannot be predicted from quantitative test results, at least for the TSE sources and host PRNP genotypes used in this study. Furthermore, the near inverse correlation between infectious titres and Western blot results (high protease pre-treatment) argues for a dissociation between infectivity and PrP(res).

  11. Pattern recognition methods for protein functional site prediction.

    PubMed

    Yang, Zheng Rong; Wang, Lipo; Young, Natasha; Trudgian, Dave; Chou, Kuo-Chen

    2005-10-01

    Protein functional site prediction is closely related to drug design, hence to public health. In order to save the cost and the time spent on identifying the functional sites in sequenced proteins in biology laboratory, computer programs have been widely used for decades. Many of them are implemented using the state-of-the-art pattern recognition algorithms, including decision trees, neural networks and support vector machines. Although the success of this effort has been obvious, advanced and new algorithms are still under development for addressing some difficult issues. This review will go through the major stages in developing pattern recognition algorithms for protein functional site prediction and outline the future research directions in this important area. PMID:16248799

  12. Choosing negative examples for the prediction of protein-protein interactions

    PubMed Central

    Ben-Hur, Asa; Noble, William Stafford

    2006-01-01

    The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions. PMID:16723005

  13. Using support vector machine for improving protein-protein interaction prediction utilizing domain interactions

    SciTech Connect

    Singhal, Mudita; Shah, Anuj R.; Brown, Roslyn N.; Adkins, Joshua N.

    2010-10-02

    Understanding protein interactions is essential to gain insights into the biological processes at the whole cell level. The high-throughput experimental techniques for determining protein-protein interactions (PPI) are error prone and expensive with low overlap amongst them. Although several computational methods have been proposed for predicting protein interactions there is definite room for improvement. Here we present DomainSVM, a predictive method for PPI that uses computationally inferred domain-domain interaction values in a Support Vector Machine framework to predict protein interactions. DomainSVM method utilizes evidence of multiple interacting domains to predict a protein interaction. It outperforms existing methods of PPI prediction by achieving very high explanation ratios, precision, specificity, sensitivity and F-measure values in a 10 fold cross-validation study conducted on the positive and negative PPIs in yeast. A Functional comparison study using GO annotations on the positive and the negative test sets is presented in addition to discussing novel PPI predictions in Salmonella Typhimurium.

  14. Nanoparticles-cell association predicted by protein corona fingerprints

    NASA Astrophysics Data System (ADS)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  15. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  16. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  17. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  18. ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction

    PubMed Central

    Wu, Sitao; Zhang, Yang

    2008-01-01

    We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28°/46°, which is ∼10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0×10−300 (or <1.0×10−148) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10–20°) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling. PMID:18923703

  19. MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading.

    PubMed

    Lu, Long; Lu, Hui; Skolnick, Jeffrey

    2002-11-15

    In this postgenomic era, the ability to identify protein-protein interactions on a genomic scale is very important to assist in the assignment of physiological function. Because of the increasing number of solved structures involving protein complexes, the time is ripe to extend threading to the prediction of quaternary structure. In this spirit, a multimeric threading approach has been developed. The approach is comprised of two phases. In the first phase, traditional threading on a single chain is applied to generate a set of potential structures for the query sequences. In particular, we use our recently developed threading algorithm, PROSPECTOR. Then, for those proteins whose template structures are part of a known complex, we rethread on both partners in the complex and now include a protein-protein interfacial energy. To perform this analysis, a database of multimeric protein structures has been constructed, the necessary interfacial pairwise potentials have been derived, and a set of empirical indicators to identify true multimers based on the threading Z-score and the magnitude of the interfacial energy have been established. The algorithm has been tested on a benchmark set comprised of 40 homodimers, 15 heterodimers, and 69 monomers that were scanned against a protein library of 2478 structures that comprise a representative set of structures in the Protein Data Bank. Of these, the method correctly recognized and assigned 36 homodimers, 15 heterodimers, and 65 monomers. This protocol was applied to identify partners and assign quaternary structures of proteins found in the yeast database of interacting proteins. Our multimeric threading algorithm correctly predicts 144 interacting proteins, compared to the 56 (26) cases assigned by PSI-BLAST using a (less) permissive E-value of 1 (0.01). Next, all possible pairs of yeast proteins have been examined. Predictions (n = 2865) of protein-protein interactions are made; 1138 of these 2865 interactions have

  20. TSEMA: interactive prediction of protein pairings between interacting families.

    PubMed

    Izarzugaza, José M G; Juan, David; Pons, Carles; Ranea, Juan A G; Valencia, Alfonso; Pazos, Florencio

    2006-07-01

    An entire family of methodologies for predicting protein interactions is based on the observed fact that families of interacting proteins tend to have similar phylogenetic trees due to co-evolution. One application of this concept is the prediction of the mapping between the members of two interacting protein families (which protein within one family interacts with which protein within the other). The idea is that the real mapping would be the one maximizing the similarity between the trees. Since the exhaustive exploration of all possible mappings is not feasible for large families, current approaches use heuristic techniques which do not ensure the best solution to be found. This is why it is important to check the results proposed by heuristic techniques and to manually explore other solutions. Here we present TSEMA, the server for efficient mapping assessment. This system calculates an initial mapping between two families of proteins based on a Monte Carlo approach and allows the user to interactively modify it based on performance figures and/or specific biological knowledge. All the explored mappings are graphically shown over a representation of the phylogenetic trees. The system is freely available at http://pdg.cnb.uam.es/TSEMA. Standalone versions of the software behind the interface are available upon request from the authors.

  1. Predicting and improving the protein sequence alignment quality by support vector regression

    PubMed Central

    Lee, Minho; Jeong, Chan-seok; Kim, Dongsup

    2007-01-01

    Background For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. Results In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. Conclusion The present work demonstrates that the alignment quality can be

  2. Aptamer-conjugated live human immune cell based biosensors for the accurate detection of C-reactive protein

    NASA Astrophysics Data System (ADS)

    Hwang, Jangsun; Seo, Youngmin; Jo, Yeonho; Son, Jaewoo; Choi, Jonghoon

    2016-10-01

    C-reactive protein (CRP) is a pentameric protein that is present in the bloodstream during inflammatory events, e.g., liver failure, leukemia, and/or bacterial infection. The level of CRP indicates the progress and prognosis of certain diseases; it is therefore necessary to measure CRP levels in the blood accurately. The normal concentration of CRP is reported to be 1–3 mg/L. Inflammatory events increase the level of CRP by up to 500 times; accordingly, CRP is a biomarker of acute inflammatory disease. In this study, we demonstrated the preparation of DNA aptamer-conjugated peripheral blood mononuclear cells (Apt-PBMCs) that specifically capture human CRP. Live PBMCs functionalized with aptamers could detect different levels of human CRP by producing immune complexes with reporter antibody. The binding behavior of Apt-PBMCs toward highly concentrated CRP sites was also investigated. The immune responses of Apt-PBMCs were evaluated by measuring TNF-alpha secretion after stimulating the PBMCs with lipopolysaccharides. In summary, engineered Apt-PBMCs have potential applications as live cell based biosensors and for in vitro tracing of CRP secretion sites.

  3. Aptamer-conjugated live human immune cell based biosensors for the accurate detection of C-reactive protein

    PubMed Central

    Hwang, Jangsun; Seo, Youngmin; Jo, Yeonho; Son, Jaewoo; Choi, Jonghoon

    2016-01-01

    C-reactive protein (CRP) is a pentameric protein that is present in the bloodstream during inflammatory events, e.g., liver failure, leukemia, and/or bacterial infection. The level of CRP indicates the progress and prognosis of certain diseases; it is therefore necessary to measure CRP levels in the blood accurately. The normal concentration of CRP is reported to be 1–3 mg/L. Inflammatory events increase the level of CRP by up to 500 times; accordingly, CRP is a biomarker of acute inflammatory disease. In this study, we demonstrated the preparation of DNA aptamer-conjugated peripheral blood mononuclear cells (Apt-PBMCs) that specifically capture human CRP. Live PBMCs functionalized with aptamers could detect different levels of human CRP by producing immune complexes with reporter antibody. The binding behavior of Apt-PBMCs toward highly concentrated CRP sites was also investigated. The immune responses of Apt-PBMCs were evaluated by measuring TNF-alpha secretion after stimulating the PBMCs with lipopolysaccharides. In summary, engineered Apt-PBMCs have potential applications as live cell based biosensors and for in vitro tracing of CRP secretion sites. PMID:27708384

  4. Protein design by fusion: implications for protein structure prediction and evolution

    SciTech Connect

    Skorupka, Katarzyna; Han, Seong Kyu; Nam, Hyun-Jun; Kim, Sanguk; Faham, Salem

    2013-11-19

    Domain fusion is a useful tool in protein design. Here, the structure of a fusion of the heterodimeric flagella-assembly proteins FliS and FliC is reported. Although the ability of the fusion protein to maintain the structure of the heterodimer may be apparent, threading-based structural predictions do not properly fuse the heterodimer. Additional examples of naturally occurring heterodimers that are homologous to full-length proteins were identified. These examples highlight that the designed protein was engineered by the same tools as used in the natural evolution of proteins and that heterodimeric structures contain a wealth of information, currently unused, that can improve structural predictions.

  5. Sequence-Based Prediction of Type III Secreted Proteins

    PubMed Central

    Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas

    2009-01-01

    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will

  6. Topological Predictions for Integral Membrane Channel and Carrier Proteins

    PubMed Central

    Abhinay, Reddy; Jaehoon, Cho; Sam, Ling; Vamsee, Reddy; Maksim, Shlykov; Milton, Saier

    2014-01-01

    We evaluated topological predictions for nine different programs, HMMTOP, TMHMM, SVMTOP, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT-SVM (hereinafter referred to as MEMSAT), and SPOCTOPUS. These programs were first evaluated using four large topologically well-defined families of secondary transporters, and the three best programs were further evaluated using topologically more diverse families of channels and carriers. In the initial studies, the order of accuracy was: SPOCTOPUS>MEMSAT>HMMTOP>TOPCONS>PHOBIUS>TMHMM>SVMTOP>DAS>S OSUI. Some families, such as the Sugar Porter family (2.A.1.1) of the Major Facilitator Superfamily (MFS; TC# 2.A.1) and the Amino acid/Polyamine/Organocation (APC) Family (TC# 2.A.3), were correctly predicted with high accuracy while others, such as the Mitochondrial Carrier (MC) (TC# 2.A.29) and the K+ transporter (Trk) families (TC# 2.A.38), were predicted with much lower accuracy. For small, topologically homogeneous families, SPOCTOPUS and MEMSAT were generally most reliable, while with large, more diverse superfamilies, HMMTOP often proved to have the greatest prediction accuracy. We next developed a novel program, TM-STATS, that tabulates HMMTOP, SPOCTOPUS or MEMSAT-based topological predictions for any subdivision (class, subclass, superfamily, family, subfamily, or any combination of these) of the Transporter Classification Database (TCDB; www.tcdb.org) and examined the following subclasses: α-type channel proteins (TC subclasses 1.A and 1.E), secreted poreforming toxins (TC subclass 1.C) and secondary carriers (subclass 2.A). Histograms 3 were generated for each of these subclasses, and the results were analyzed according to subclass, family and protein. The results provide an update of topological predictions for integral membrane transport proteins as well as guides for the development of more reliable topological prediction programs, taking family-specific characteristics into account. PMID:24992992

  7. Predicting and analyzing protein phosphorylation sites in plants using musite.

    PubMed

    Yao, Qiuming; Gao, Jianjiong; Bollinger, Curtis; Thelen, Jay J; Xu, Dong

    2012-01-01

    Although protein phosphorylation sites can be reliably identified with high-resolution mass spectrometry, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites and provide hypotheses in experimental design. Musite is a tool that we previously developed to predict phosphorylation sites based solely on protein sequence. However, it was not comprehensively applied to plants. In this study, the phosphorylation data from Arabidopsis thaliana, B. napus, G. max, M. truncatula, O. sativa, and Z. mays were collected for cross-species testing and the overall plant-specific prediction as well. The results show that the model for A. thaliana can be extended to other organisms, and the overall plant model from Musite outperforms the current plant-specific prediction tools, Plantphos, and PhosphAt, in prediction accuracy. Furthermore, a comparative study of predicted phosphorylation sites across orthologs among different plants was conducted to reveal potential evolutionary features. A bipolar distribution of isolated, non-conserved phosphorylation sites, and highly conserved ones in terms of the amino acid type was observed. It also shows that predicted phosphorylation sites conserved within orthologs do not necessarily share more sequence similarity in the flanking regions than the background, but they often inherit protein disorder, a property that does not necessitate high sequence conservation. Our analysis also suggests that the phosphorylation frequencies among serine, threonine, and tyrosine correlate with their relative proportion in disordered regions. Musite can be used as a web server (http://musite.net) or downloaded as an open-source standalone tool (http://musite.sourceforge.net/).

  8. Prediction of contact residue pairs based on co-substitution between sites in protein structures.

    PubMed

    Miyazawa, Sanzo

    2013-01-01

    Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints in varying degrees on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing phylogenetic correlations and indirect dependences through other residues within a protein or even through other molecules. Rapid growth of protein families with unknown folds requires an accurate de novo prediction method for protein structure. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that inferred residue-residue proximities can be sufficient information to predict a protein fold without the use of known three-dimensional structures. Here, we propose an alternative method of inferring coevolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. Substitution probability and physico-chemical changes (volume, charge, hydrogen-bonding capability, and others) accompanied by substitutions at each site in each branch of a phylogenetic tree are estimated with the likelihood of each substitution, and their direct correlations between sites are used to detect concurrent and compensatory substitutions. In order to extract direct dependences between sites, partial correlation coefficients of the characteristic changes along branches between sites, in which linear multiple dependences on feature vectors at other sites are removed, are calculated and used to rank coevolving site pairs. Accuracy of contact prediction based on the present coevolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families

  9. Unfolded protein ensembles, folding trajectories, and refolding rate prediction.

    PubMed

    Das, A; Sin, B K; Mohazab, A R; Plotkin, S S

    2013-09-28

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10(-7)). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  10. Unfolded protein ensembles, folding trajectories, and refolding rate prediction

    NASA Astrophysics Data System (ADS)

    Das, A.; Sin, B. K.; Mohazab, A. R.; Plotkin, S. S.

    2013-09-01

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10-7). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  11. Stealth surface modification of surface-enhanced Raman scattering substrates for sensitive and accurate detection in protein solutions.

    PubMed

    Sun, Fang; Ella-Menye, Jean-Rene; Galvan, Daniel David; Bai, Tao; Hung, Hsiang-Chieh; Chou, Ying-Nien; Zhang, Peng; Jiang, Shaoyi; Yu, Qiuming

    2015-03-24

    Reliable surface-enhanced Raman scattering (SERS) based biosensing in complex media is impeded by nonspecific protein adsorptions. Because of the near-field effect of SERS, it is challenging to modify SERS-active substrates using conventional nonfouling materials without introducing interference from their SERS signals. Herein, we report a stealth surface modification strategy for sensitive, specific and accurate detection of fructose in protein solutions using SERS by forming a mixed self-assembled monolayer (SAM). The SAM consists of a short zwitterionic thiol, N,N-dimethyl-cysteamine-carboxybetaine (CBT), and a fructose probe 4-mercaptophenylboronic acid (4-MPBA). The specifically designed and synthesized CBT not only resists protein fouling effectively, but also has very weak Raman activity compared to 4-MPBA. Thus, the CBT SAM provides a stealth surface modification to SERS-active substrates. The surface compositions of mixed SAMs were investigated using X-ray photoelectron spectroscopy (XPS) and SERS, and their nonfouling properties were studied with a surface plasmon resonance (SPR) biosensor. The mixed SAM with a surface composition of 94% CBT demonstrated a very low bovine serum albumin (BSA) adsorption (∼3 ng/cm(2)), and moreover, only the 4-MPBA signal appeared in the SERS spectrum. With the use of this surface-modified SERS-active substrate, quantification of fructose over clinically relevant concentrations (0.01-1 mM) was achieved. Partial least-squares regression (PLS) analysis showed that the detection sensitivity and accuracy were maintained for the measurements in 1 mg/mL BSA solutions. This stealth surface modification strategy provides a novel route to introduce nonfouling property to SERS-active substrates for SERS biosensing in complex media.

  12. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

    PubMed Central

    Pellegrini, Marco

    2015-01-01

    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR. PMID:26442257

  13. Prediction and Annotation of Plant Protein Interaction Networks

    SciTech Connect

    McDermott, Jason E.; Wang, Jun; Yu, Jun; Wong, Gane Ka-Shu; Samudrala, Ram

    2009-02-01

    Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  14. Modelling proteins' hidden conformations to predict antibiotic resistance

    NASA Astrophysics Data System (ADS)

    Hart, Kathryn M.; Ho, Chris M. W.; Dutta, Supratik; Gross, Michael L.; Bowman, Gregory R.

    2016-10-01

    TEM β-lactamase confers bacteria with resistance to many antibiotics and rapidly evolves activity against new drugs. However, functional changes are not easily explained by differences in crystal structures. We employ Markov state models to identify hidden conformations and explore their role in determining TEM's specificity. We integrate these models with existing drug-design tools to create a new technique, called Boltzmann docking, which better predicts TEM specificity by accounting for conformational heterogeneity. Using our MSMs, we identify hidden states whose populations correlate with activity against cefotaxime. To experimentally detect our predicted hidden states, we use rapid mass spectrometric footprinting and confirm our models' prediction that increased cefotaxime activity correlates with reduced Ω-loop flexibility. Finally, we design novel variants to stabilize the hidden cefotaximase states, and find their populations predict activity against cefotaxime in vitro and in vivo. Therefore, we expect this framework to have numerous applications in drug and protein design.

  15. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  16. (PS)2: protein structure prediction server version 3.0.

    PubMed

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. PMID:25943546

  17. Prediction of Protein-DNA binding by Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Deng, Yuefan; Eisenberg, Moises; Korobka, Alex

    1997-08-01

    We present an analysis and prediction of protein-DNA binding specificity based on the hydrogen bonding between DNA, protein, and auxillary clusters of water molecules. Zif268, glucocorticoid receptor, λ-repressor mutant, HIN-recombinase, and tramtrack protein-DNA complexes are studied. Hydrogen bonds are approximated by the Lennard-Jones potential with a cutoff distance between the hydrogen and the acceptor atoms set to 3.2 Åand an angular component based on a dipole-dipole interaction. We use a three-stage docking algorithm: geometric hashing that matches pairs of hydrogen bonding sites; (2) least-squares minimization of pairwise distances to filter out insignificant matches; and (3) Monte Carlo stochastic search to minimize the energy of the system. More information can be obtained from our first paper on this subject [Y.Deng et all, J.Computational Chemistry (1995)]. Results show that the biologically correct base pair is selected preferentially when there are two or more strong hydrogen bonds (with LJ potential lower than -0.20) that bind it to the protein. Predicted sequences are less stable in the case of weaker bonding sites. In general the inclusion of water bridges does increase the number of base pairs for which correct specificity is predicted.

  18. (PS)2: protein structure prediction server version 3.0.

    PubMed

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/.

  19. PECM: prediction of extracellular matrix proteins using the concept of Chou's pseudo amino acid composition.

    PubMed

    Zhang, Jian; Sun, Pingping; Zhao, Xiaowei; Ma, Zhiqiang

    2014-12-21

    The extracellular matrix proteins (ECMs) are widely found in the tissues of multicellular organisms. They consist of various secreted proteins, mainly polysaccharides and glycoproteins. The ECMs involve the exchange of materials and information between resident cells and the external environment. Accurate identification of ECMs is a significant step in understanding the evolution of cancer as well as promises wide range of potential applications in therapeutic targets or diagnostic markers. In this paper, an accurate computational method named PECM is proposed for identifying ECMs. Here, we explore various sequence-derived discriminative features including evolutionary information, predicted secondary structure, and physicochemical properties. Rather than simply combining the features which may bring information redundancy and unwanted noises, we use Fisher-Markov selector and incremental feature selection approach to search the optimal feature subsets. Then, we train our model by the technique of support vector machine (SVM). PECM achieves good prediction performance with the ACC scores about 86% and 90% on testing and independent datasets, which are competitive with the state-of-the-art ECMs prediction tools. A web-server named PECM which implements the proposed approach is freely available at http://59.73.198.144:8088/PECM/.

  20. Predicting the Binding Patterns of Hub Proteins: A Study Using Yeast Protein Interaction Networks

    PubMed Central

    Andorf, Carson M.; Honavar, Vasant; Sen, Taner Z.

    2013-01-01

    Background Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Of particular interest are hub proteins that can interact with large numbers of partners and often play essential roles in cellular control. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish-interface hubs (SIH) with one or two binding sites, or multiple-interface hubs (MIH) with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., interact with different partners at different times or locations) or party hubs (i.e., simultaneously interact with multiple partners). Methodology Our approach works in 3 phases: Phase I classifies if a protein is likely to bind with another protein. Phase II determines if a protein-binding (PB) protein is a hub. Phase III classifies PB proteins as singlish-interface versus multiple-interface hubs and date versus party hubs. At each stage, we use sequence-based predictors trained using several standard machine learning techniques. Conclusions Our method is able to predict whether a protein is a protein-binding protein with an accuracy of 94% and a correlation coefficient of 0.87; identify hubs from non-hubs with 100% accuracy for 30% of the data; distinguish date hubs/party hubs with 69% accuracy and area under ROC curve of 0.68; and SIH/MIH with 89% accuracy and area under ROC curve of 0.84. Because our method is based on sequence information alone, it can be used even in settings where reliable protein-protein interaction data or structures of protein-protein complexes are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions. Availability We provide a web server for our three-phase approach: http://hybsvm.gdcb.iastate.edu. PMID:23431393

  1. Statistical potential for assessment and prediction of protein structures

    PubMed Central

    Shen, Min-yi; Sali, Andrej

    2006-01-01

    Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8. PMID:17075131

  2. Prediction of HIV drug resistance from genotype with encoded three-dimensional protein structure

    PubMed Central

    2014-01-01

    Background Drug resistance has become a severe challenge for treatment of HIV infections. Mutations accumulate in the HIV genome and make certain drugs ineffective. Prediction of resistance from genotype data is a valuable guide in choice of drugs for effective therapy. Results In order to improve the computational prediction of resistance from genotype data we have developed a unified encoding of the protein sequence and three-dimensional protein structure of the drug target for classification and regression analysis. The method was tested on genotype-resistance data for mutants of HIV protease and reverse transcriptase. Our graph based sequence-structure approach gives high accuracy with a new sparse dictionary classification method, as well as support vector machine and artificial neural networks classifiers. Cross-validated regression analysis with the sparse dictionary gave excellent correlation between predicted and observed resistance. Conclusion The approach of encoding the protein structure and sequence as a 210-dimensional vector, based on Delaunay triangulation, has promise as an accurate method for predicting resistance from sequence for drugs inhibiting HIV protease and reverse transcriptase. PMID:25081370

  3. Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy

    PubMed Central

    Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing

    2016-01-01

    Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651

  4. Graphlet kernels for prediction of functional residues in protein structures.

    PubMed

    Vacic, Vladimir; Iakoucheva, Lilia M; Lonardi, Stefano; Radivojac, Predrag

    2010-01-01

    We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.

  5. PSCL: predicting protein subcellular localization based on optimal functional domains.

    PubMed

    Wang, Kai; Hu, Le-Le; Shi, Xiao-He; Dong, Ying-Song; Li, Hai-Peng; Wen, Tie-Qiao

    2012-01-01

    It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.

  6. Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli.

    PubMed

    Chang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I; Tey, BengTi; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara

    2016-01-01

    Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/. PMID:26931649

  7. Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli

    PubMed Central

    Chang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I.; Tey, BengTi; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara

    2016-01-01

    Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson’s correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/. PMID:26931649

  8. Structure-Based Prediction of Protein-Folding Transition Paths.

    PubMed

    Jacobs, William M; Shakhnovich, Eugene I

    2016-09-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions. PMID:27602721

  9. Structure-Based Prediction of Protein-Folding Transition Paths

    NASA Astrophysics Data System (ADS)

    Jacobs, William M.; Shakhnovich, Eugene I.

    2016-09-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions.

  10. Predicting protein structures with a multiplayer online game.

    PubMed

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-01

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  11. Predicted Protein Subcellular Localization in Dominant Surface Ocean Bacterioplankton

    PubMed Central

    2012-01-01

    Bacteria consume dissolved organic matter (DOM) through hydrolysis, transport and intracellular metabolism, and these activities occur in distinct subcellular localizations. Bacterial protein subcellular localizations for several major marine bacterial groups were predicted using genomic, metagenomic and metatranscriptomic data sets following modification of MetaP software for use with partial gene sequences. The most distinct pattern of subcellular localization was found for Bacteroidetes, whose genomes were substantially enriched with outer membrane and extracellular proteins but depleted of inner membrane proteins compared with five other taxa (SAR11, Roseobacter, Synechococcus, Prochlorococcus, oligotrophic marine Gammaproteobacteria). When subcellular localization patterns were compared between genes and transcripts, three taxa had expression biased toward proteins localized to cell locations outside of the cytosol (SAR11, Roseobacter, and Synechococcus), as expected based on the importance of carbon and nutrient acquisition in an oligotrophic ocean, but two taxa did not (oligotrophic marine Gammaproteobacteria and Bacteroidetes). Diel variations in the fraction and putative gene functions of transcripts encoding inner membrane and periplasmic proteins compared to cytoplasmic proteins suggest a close coupling of photosynthetic extracellular release and bacterial consumption, providing insights into interactions between phytoplankton, bacteria, and DOM. PMID:22773648

  12. Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes.

    PubMed

    Huang, Wen-Lin

    2012-11-01

    Protein secretion is an important biological process for both eukaryotes and prokaryotes. Several sequence-based methods mainly rely on utilizing various types of complementary features to design accurate classifiers for predicting non-classical secretory proteins. Gene Ontology (GO) terms are increasing informative in predicting protein functions. However, the number of used GO terms is often very large. For example, there are 60,020 GO terms used in the prediction method Euk-mPLoc 2.0 for subcellular localization. This study proposes a novel approach to identify a small set of m top-ranked GO terms served as the only type of input features to design a support vector machine (SVM) based method Sec-GO to predict non-classical secretory proteins in both eukaryotes and prokaryotes. To evaluate the Sec-GO method, two existing methods and their used datasets are adopted for performance comparisons. The Sec-GO method using m=436 GO terms yields an independent test accuracy of 96.7% on mammalian proteins, much better than the existing method SPRED (82.2%) which uses frequencies of tri-peptides and short peptides, secondary structure, and physicochemical properties as input features of a random forest classifier. Furthermore, when applying to Gram-positive bacterial proteins, the Sec-GO with m=158 GO terms has a test accuracy of 94.5%, superior to NClassG+ (90.0%) which uses SVM with several feature types, comprising amino acid composition, di-peptides, physicochemical properties and the position specific weighting matrix. Analysis of the distribution of secretory proteins in a GO database indicates the percentage of the non-classical secretory proteins annotated by GO is larger than that of classical secretory proteins in both eukaryotes and prokaryotes. Of the m top-ranked GO features, the top-four GO terms are all annotated by such subcellular locations as GO:0005576 (Extracellular region). Additionally, the method Sec-GO is easily implemented and its web tool of

  13. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    PubMed

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ . PMID:25424913

  14. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    PubMed

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ .

  15. Predicting oligonucleotide-directed mutagenesis failures in protein engineering.

    PubMed

    Wassman, Christopher D; Tam, Phillip Y; Lathrop, Richard H; Weiss, Gregory A

    2004-01-01

    Protein engineering uses oligonucleotide-directed mutagenesis to modify DNA sequences through a two-step process of hybridization and enzymatic synthesis. Inefficient reactions confound attempts to introduce mutations, especially for the construction of vast combinatorial protein libraries. This paper applied computational approaches to the problem of inefficient mutagenesis. Several results implicated oligonucleotide annealing to non-target sites, termed 'cross-hybridization', as a significant contributor to mutagenesis reaction failures. Test oligonucleotides demonstrated control over reaction outcomes. A novel cross-hybridization score, quickly computable for any plasmid and oligonucleotide mixture, directly correlated with yields of deleterious mutagenesis side products. Cross-hybridization was confirmed conclusively by partial incorporation of an oligonucleotide at a predicted cross-hybridization site, and by modification of putative template secondary structure to control cross-hybridization. Even in low concentrations, cross-hybridizing species in mixtures poisoned reactions. These results provide a basis for improved mutagenesis efficiencies and increased diversities of cognate protein libraries.

  16. Turn prediction in proteins using a pattern-matching approach.

    PubMed

    Cohen, F E; Abarbanel, R M; Kuntz, I D; Fletterick, R J

    1986-01-14

    We extend the use of amino acid sequence patterns [Cohen, F.E., Abarbanel, R. M., Kuntz, I. D., & Fletterick, R. J. (1983) Biochemistry 22, 4894-4904] to the identification of turns in globular proteins. The approach uses a conservative strategy, combined with a hierarchical search (strongest patterns first) and length-dependent masking, to achieve high accuracy (95%) on a test set of proteins of known structure. Applying the same procedure to homologous families gives a 90% success rate. Straightforward changes are suggested to improve the predictive power. The computer program, written in Lisp, provides a general pattern-recognition language well suited for a number of investigations of protein and nucleic acid sequences. PMID:3754149

  17. Predictive energy landscapes for folding membrane protein assemblies

    NASA Astrophysics Data System (ADS)

    Truong, Ha H.; Kim, Bobby L.; Schafer, Nicholas P.; Wolynes, Peter G.

    2015-12-01

    We study the energy landscapes for membrane protein oligomerization using the Associative memory, Water mediated, Structure and Energy Model with an implicit membrane potential (AWSEM-membrane), a coarse-grained molecular dynamics model previously optimized under the assumption that the energy landscapes for folding α-helical membrane protein monomers are funneled once their native topology within the membrane is established. In this study we show that the AWSEM-membrane force field is able to sample near native binding interfaces of several oligomeric systems. By predicting candidate structures using simulated annealing, we further show that degeneracies in predicting structures of membrane protein monomers are generally resolved in the folding of the higher order assemblies as is the case in the assemblies of both nicotinic acetylcholine receptor and V-type Na+-ATPase dimers. The physics of the phenomenon resembles domain swapping, which is consistent with the landscape following the principle of minimal frustration. We revisit also the classic Khorana study of the reconstitution of bacteriorhodopsin from its fragments, which is the close analogue of the early Anfinsen experiment on globular proteins. Here, we show the retinal cofactor likely plays a major role in selecting the final functional assembly.

  18. A threading approach to protein structure prediction: Studies on TNF-like molecules, Rev proteins, and protein kinases

    NASA Astrophysics Data System (ADS)

    Ihm, Yungok

    The main focus of this dissertation is the application of the threading approach to specific biological problems. The threading scheme developed in our group targets incorporating important structural features necessary for detecting structural similarity between the target sequence and the template structure. This enables us to use our threading method to solve problems for which sequence-based methods are not very much useful. We applied our threading method to predict the three-dimensional structures of lentivirus (EIAV, HIV-1, FIV, SIV) Rev proteins. Predicted structures of Rev proteins suggest that they share a structural similarity among themselves (four-helix bundle). Also, the threading approach has been utilized for screening for potential TNF-like molecules in Arabidopsis. The threading approach identified 35 potential TNF-like proteins in Arabidopsis, six of which are particularly interesting to be tested for the receptor kinase ligand activity. Threading method has also been used to identify potentially new protein kinases, which are not included in the protein kinase data base of C. elegans and Arabidopis. We identified eleven potentially new protein kinases and an additional protein worth investigating for protein kinase activity in C. elegans. Further, we identified ten potentially new protein kinases and additional four proteins worth investigating for the protein kinase activity in Arabidopsis.

  19. Protein-spanning water networks and implications for prediction of protein-protein interactions mediated through hydrophobic effects.

    PubMed

    Cui, Di; Ou, Shuching; Patel, Sandeep

    2014-12-01

    Hydrophobic effects, often conflated with hydrophobic forces, are implicated as major determinants in biological association and self-assembly processes. Protein-protein interactions involved in signaling pathways in living systems are a prime example where hydrophobic effects have profound implications. In the context of protein-protein interactions, a priori knowledge of relevant binding interfaces (i.e., clusters of residues involved directly with binding interactions) is difficult. In the case of hydrophobically mediated interactions, use of hydropathy-based methods relying on single residue hydrophobicity properties are routinely and widely used to predict propensities for such residues to be present in hydrophobic interfaces. However, recent studies suggest that consideration of hydrophobicity for single residues on a protein surface require accounting of the local environment dictated by neighboring residues and local water. In this study, we use a method derived from percolation theory to evaluate spanning water networks in the first hydration shells of a series of small proteins. We use residue-based water density and single-linkage clustering methods to predict hydrophobic regions of proteins; these regions are putatively involved in binding interactions. We find that this simple method is able to predict with sufficient accuracy and coverage the binding interface residues of a series of proteins. The approach is competitive with automated servers. The results of this study highlight the importance of accounting of local environment in determining the hydrophobic nature of individual residues on protein surfaces.

  20. Protein structure prediction with local adjust tabu search algorithm

    PubMed Central

    2014-01-01

    Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708

  1. MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data.

    PubMed

    Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package "MEGADOCK" that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673

  2. An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1.

    PubMed

    Kosakovsky Pond, Sergei L; Posada, David; Stawiski, Eric; Chappey, Colombe; Poon, Art F Y; Hughes, Gareth; Fearnhill, Esther; Gravenor, Mike B; Leigh Brown, Andrew J; Frost, Simon D W

    2009-11-01

    Genetically diverse pathogens (such as Human Immunodeficiency virus type 1, HIV-1) are frequently stratified into phylogenetically or immunologically defined subtypes for classification purposes. Computational identification of such subtypes is helpful in surveillance, epidemiological analysis and detection of novel variants, e.g., circulating recombinant forms in HIV-1. A number of conceptually and technically different techniques have been proposed for determining the subtype of a query sequence, but there is not a universally optimal approach. We present a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial) sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities. Our Subtype Classification Using Evolutionary ALgorithms (SCUEAL) procedure is shown to perform very well in a variety of simulation scenarios, runs in parallel when multiple sequences are being screened, and matches or exceeds the performance of existing approaches on typical empirical cases. We applied SCUEAL to all available polymerase (pol) sequences from two large databases, the Stanford Drug Resistance database and the UK HIV Drug Resistance Database. Comparing with subtypes which had previously been assigned revealed that a minor but substantial (approximately 5%) fraction of pure subtype sequences may in fact be within- or inter-subtype recombinants. A free implementation of SCUEAL is provided as a module for the HyPhy package and the Datamonkey web server. Our method is especially useful when an accurate automatic classification of an unknown strain is desired, and is positioned to complement and extend faster but less accurate methods. Given the increasingly frequent use of HIV subtype information in studies focusing on the effect of subtype on treatment, clinical outcome, pathogenicity and vaccine design, the importance of accurate

  3. PTMSearchPlus: Software Tool for Automated Protein Identification and Post-Translational and Post-Translational Modification Characterization by Integrating Accurate Intact Protein Mass and Bottom-Up Mass Spectrometric Data Searches

    SciTech Connect

    Kertesz, Vilmos; Connelly, Heather M; Erickson, Brian K; Hettich, Robert {Bob} L

    2009-01-01

    PTMSearchPlus is a software tool for the automated integration of accurate intact protein mass (AIPM) and bottom-up (BU) mass spectra searches/data in order to both confidently identify the intact proteins and to characterize their post-translational modifications (PTMs). The development of PTMSearchPlus was motivated by the desire to effectively integrate high-resolution intact protein molecular masses with bottom-up peptide MS/MS data. PTMSearchPlus requires as input both intact protein and proteolytic peptide mass spectra collected from the same protein mixture, a FASTA protein database, and a selection of possible PTMs, the types and ranges of which can be specified. The output of PTMSearchPlus is a list of intact and modified proteins matching the AIPM data concomitant with their respective peptides found by the BU search. This list also contains protein and peptide sequence coverage information, scores, etc. that can be used for further evaluation or refiltering of the results. Corresponding and annotated AIPM and BU mass spectra are also displayed for visual inspection when a listed protein or a peptide is selected. These and other controls ensure that the user can manually evaluate, modify (e.g., remove obvious false positives, low quality spectra etc.), and save the results of the automated search if necessary. Driven by the exponential growth in the number of possible peptide candidates in a BU search when multiple PTMs are probed, the advantages on search speed by limiting the total number of possible PTMs on a peptide in the BU search or by performing an AIPM predicted BU search are also discussed in addition to the integration approach. The features of PTMSearchPlus are demonstrated using both a protein standard mixture and a complex protein mixture from Escherichia coli. Experimental data revealed a unique advantage of coupling AIPM and the BU data sets that is mutually beneficial for both approaches. Namely, AIPM data can confirm that no PTM peptides

  4. PTMSearchPlus: A Software Tool for Automated Protein Identification and Post-Translational Modification Characterization by Integrating Accurate Intact Protein Mass and Bottom-Up Mass Spectrometric Data Searches

    SciTech Connect

    Kertesz, Vilmos; Connelly, Heather M; Erickson, Brian K; Hettich, Robert {Bob} L

    2009-01-01

    PTMSearchPlus is a software tool for the automated integration of accurate intact protein mass (AIPM) and bottom-up (BU) mass spectra searches/data in order to both confidently identify the intact proteins and to characterize their post-translational modifications (PTMs). The development of PTMSearchPlus was motivated by the desire to effectively integrate high resolution intact protein molecular masses with bottom-up peptide MS/MS data. PTMSearchPlus requires as input both intact protein and proteolytic peptide mass spectra collected from the same protein mixture, a FASTA protein database, and a selection of possible PTMs, the types and ranges of which can be specified. The output of PTMSearchPlus is a list of intact and modified proteins matching the AIPM data concomitant with their respective peptides found by the BU search. This list also contains protein and peptide sequence coverage information, scores, etc. that can be used for further evaluation or refiltering of the results. Corresponding and annotated AIPM and BU mass spectra are also displayed for visual inspection when a listed protein or a peptide is selected. These and other controls ensure that the user can manually evaluate, modify (e.g. remove obvious false positives, low quality spectra etc.), and save the results of the automated search if necessary. Driven by the exponential growth in the number of possible peptide candidates in a BU search when multiple PTMs are probed, the advantages on search speed by limiting the total number of possible PTMs on a peptide in the BU search or by performing an AIPM predicted BU search are also discussed in addition to the integration approach. The features of PTMSearchPlus are demonstrated using both a protein standard mixture and a complex protein mixture from Escherichia coli. Experimental data revealed a unique advantage of coupling AIPM and the BU datasets that is mutually beneficial for both approaches. Namely, AIPM data can confirm that no PTM peptides

  5. Protein profiling reveals consequences of lifestyle choices on predicted biological aging

    PubMed Central

    Enroth, Stefan; Enroth, Sofia Bosdotter; Johansson, Åsa; Gyllensten, Ulf

    2015-01-01

    Ageing is linked to a number of changes in how the body and its organs function. On a molecular level, ageing is associated with a reduction of telomere length, changes in metabolic and gene-transcription profiles and an altered DNA-methylation pattern. Lifestyle factors such as smoking or stress can impact some of these molecular processes and thereby affect the ageing of an individual. Here we demonstrate by analysis of 77 plasma proteins in 976 individuals, that the abundance of circulating proteins accurately predicts chronological age, as well as anthropometrical measurements such as weight, height and hip circumference. The plasma protein profile can also be used to identify lifestyle factors that accelerate and decelerate ageing. We found smoking, high BMI and consumption of sugar-sweetened beverages to increase the predicted chronological age by 2–6 years, while consumption of fatty fish, drinking moderate amounts of coffee and exercising reduced the predicted age by approximately the same amount. This method can be applied to dried blood spots and may thus be useful in forensic medicine to provide basic anthropometrical measures for an individual based on a biological evidence sample. PMID:26619799

  6. Integration of genomic datasets to predict protein complexes in yeast.

    PubMed

    Jansen, Ronald; Lan, Ning; Qian, Jiang; Gerstein, Mark

    2002-01-01

    The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale. It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics. The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome. In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http:// bioinfo.mbb.yale.edu/integrate/interactions/. PMID:12836664

  7. Conformations of 1,2-dimethoxypropane and 5-methoxy-1,3-dioxane: are ab initio quantum chemistry predictions accurate?

    NASA Astrophysics Data System (ADS)

    Smith, Grant D.; Jaffe, Richard L.; Yoon, Do. Y.

    1998-06-01

    High-level ab initio quantum chemistry calculations are shown to predict conformer populations of 1,2-dimethoxypropane and 5-methoxy-1,3-dioxane that are consistent with gas-phase NMR vicinal coupling constant measurements. The conformational energies of the cyclic ether 5-methoxy-1,3-dioxane are found to be consistent with those predicted by a rotational isomeric state (RIS) model based upon the acyclic analog 1,2-dimethoxypropane. The quantum chemistry and RIS calculations indicate the presence of strong attractive 1,5 C(H 3)⋯O electrostatic interactions in these molecules, similar to those found in 1,2-dimethoxyethane.

  8. Large-scale de novo prediction of physical protein-protein association.

    PubMed

    Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas

    2011-11-01

    Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org.

  9. Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants

    PubMed Central

    2013-01-01

    Background Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a protein are of importance. While evolutionary relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutionary features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored. Results We proposed a number of evolutionary and predicted structural features for the prediction of stability changes and analysed which of them capture the determinants of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combination of the evolutionary features mutation likelihood and SIFTscore in conjunction with the predicted structural feature secondary structure. The same two evolutionary features in the combination with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance. Conclusion Although the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combination of evolutionary and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial

  10. A Maximal Graded Exercise Test to Accurately Predict VO2max in 18-65-Year-Old Adults

    ERIC Educational Resources Information Center

    George, James D.; Bradshaw, Danielle I.; Hyde, Annette; Vehrs, Pat R.; Hager, Ronald L.; Yanowitz, Frank G.

    2007-01-01

    The purpose of this study was to develop an age-generalized regression model to predict maximal oxygen uptake (VO sub 2 max) based on a maximal treadmill graded exercise test (GXT; George, 1996). Participants (N = 100), ages 18-65 years, reached a maximal level of exertion (mean plus or minus standard deviation [SD]; maximal heart rate [HR sub…

  11. Survival outcomes scores (SOFT, BAR, and Pedi-SOFT) are accurate in predicting post-liver transplant survival in adolescents.

    PubMed

    Conjeevaram Selvakumar, Praveen Kumar; Maksimak, Brian; Hanouneh, Ibrahim; Youssef, Dalia H; Lopez, Rocio; Alkhouri, Naim

    2016-09-01

    SOFT and BAR scores utilize recipient, donor, and graft factors to predict the 3-month survival after LT in adults (≥18 years). Recently, Pedi-SOFT score was developed to predict 3-month survival after LT in young children (≤12 years). These scoring systems have not been studied in adolescent patients (13-17 years). We evaluated the accuracy of these scoring systems in predicting the 3-month post-LT survival in adolescents through a retrospective analysis of data from UNOS of patients aged 13-17 years who received LT between 03/01/2002 and 12/31/2012. Recipients of combined organ transplants, donation after cardiac death, or living donor graft were excluded. A total of 711 adolescent LT recipients were included with a mean age of 15.2±1.4 years. A total of 100 patients died post-LT including 33 within 3 months. SOFT, BAR, and Pedi-SOFT scores were all found to be good predictors of 3-month post-transplant survival outcome with areas under the ROC curve of 0.81, 0.80, and 0.81, respectively. All three scores provided good accuracy for predicting 3-month survival post-LT in adolescents and may help clinical decision making to optimize survival rate and organ utilization. PMID:27478012

  12. Is demography destiny? Application of machine learning techniques to accurately predict population health outcomes from a minimal demographic dataset.

    PubMed

    Luo, Wei; Nguyen, Thin; Nichols, Melanie; Tran, Truyen; Rana, Santu; Gupta, Sunil; Phung, Dinh; Venkatesh, Svetha; Allender, Steve

    2015-01-01

    For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.

  13. Genomic Models of Short-Term Exposure Accurately Predict Long-Term Chemical Carcinogenicity and Identify Putative Mechanisms of Action

    PubMed Central

    Gusenleitner, Daniel; Auerbach, Scott S.; Melia, Tisha; Gómez, Harold F.; Sherr, David H.; Monti, Stefano

    2014-01-01

    Background Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to chemical compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of chemical carcinogenicity. Results In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds, including genotoxic and non-genotoxic carcinogens. We built a classifier that predicts a chemical's carcinogenic potential with an AUC of 0.78, and validated it on an independent dataset from the Japanese Toxicogenomics Project consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors. Conclusion Here, we validate a toxicogenomic approach to predict carcinogenicity and provide strong evidence that, with a larger set of compounds, we should be able to improve the sensitivity and specificity of the predictions. We found that the prediction of carcinogenicity is tissue-dependent and that the results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the aryl hydrocarbon receptor, and regenerative pathology in the response to carcinogen exposure. PMID:25058030

  14. Impact of predicted protein-truncating genetic variants on the human transcriptome

    PubMed Central

    Rivas, Manuel A.; Pirinen, Matti; Conrad, Donald F.; Lek, Monkol; Tsang, Emily K.; Karczewski, Konrad J.; Maller, Julian B.; Kukurba, Kimberly R.; DeLuca, David; Fromer, Menachem; Ferreira, Pedro G.; Smith, Kevin S.; Zhang, Rui; Zhao, Fengmei; Banks, Eric; Poplin, Ryan; Ruderfer, Douglas; Purcell, Shaun M.; Tukiainen, Taru; Minikel, Eric V.; Stenson, Peter D.; Cooper, David N.; Huang, Katharine H.; Sullivan, Timothy J.; Nedzel, Jared; Bustamante, Carlos D.; Li, Jin Billy; Daly, Mark J.; Guigo, Roderic; Donnelly, Peter; Ardlie, Kristin; Sammeth, Michael; Dermitzakis, Emmanouil; McCarthy, Mark I.; Montgomery, Stephen B.; Lappalainen, Tuuli; MacArthur, Daniel G.

    2015-01-01

    Accurate prediction of the functional impact of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants (PTVs), a class of variants expected to have profound impacts on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitate tissue-specific and positional effects on nonsense-mediated transcript decay, and present an improved predictive model for this decay. We directly measure the impact of variants both proximal and distal to splice junctions. Furthermore, we find that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants. PMID:25954003

  15. Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome.

    PubMed

    Rivas, Manuel A; Pirinen, Matti; Conrad, Donald F; Lek, Monkol; Tsang, Emily K; Karczewski, Konrad J; Maller, Julian B; Kukurba, Kimberly R; DeLuca, David S; Fromer, Menachem; Ferreira, Pedro G; Smith, Kevin S; Zhang, Rui; Zhao, Fengmei; Banks, Eric; Poplin, Ryan; Ruderfer, Douglas M; Purcell, Shaun M; Tukiainen, Taru; Minikel, Eric V; Stenson, Peter D; Cooper, David N; Huang, Katharine H; Sullivan, Timothy J; Nedzel, Jared; Bustamante, Carlos D; Li, Jin Billy; Daly, Mark J; Guigo, Roderic; Donnelly, Peter; Ardlie, Kristin; Sammeth, Michael; Dermitzakis, Emmanouil T; McCarthy, Mark I; Montgomery, Stephen B; Lappalainen, Tuuli; MacArthur, Daniel G

    2015-05-01

    Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants. PMID:25954003

  16. HHomp--prediction and classification of outer membrane proteins.

    PubMed

    Remmert, Michael; Linke, Dirk; Lupas, Andrei N; Söding, Johannes

    2009-07-01

    Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all beta-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20,000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at http://toolkit.tuebingen.mpg.de/hhomp.

  17. Brainstorming: weighted voting prediction of inhibitors for protein targets.

    PubMed

    Plewczynski, Dariusz

    2011-09-01

    The "Brainstorming" approach presented in this paper is a weighted voting method that can improve the quality of predictions generated by several machine learning (ML) methods. First, an ensemble of heterogeneous ML algorithms is trained on available experimental data, then all solutions are gathered and a consensus is built between them. The final prediction is performed using a voting procedure, whereby the vote of each method is weighted according to a quality coefficient calculated using multivariable linear regression (MLR). The MLR optimization procedure is very fast, therefore no additional computational cost is introduced by using this jury approach. Here, brainstorming is applied to selecting actives from large collections of compounds relating to five diverse biological targets of medicinal interest, namely HIV-reverse transcriptase, cyclooxygenase-2, dihydrofolate reductase, estrogen receptor, and thrombin. The MDL Drug Data Report (MDDR) database was used for selecting known inhibitors for these protein targets, and experimental data was then used to train a set of machine learning methods. The benchmark dataset (available at http://bio.icm.edu.pl/∼darman/chemoinfo/benchmark.tar.gz ) can be used for further testing of various clustering and machine learning methods when predicting the biological activity of compounds. Depending on the protein target, the overall recall value is raised by at least 20% in comparison to any single machine learning method (including ensemble methods like random forest) and unweighted simple majority voting procedures.

  18. Length of sick leave – Why not ask the sick-listed? Sick-listed individuals predict their length of sick leave more accurately than professionals

    PubMed Central

    Fleten, Nils; Johnsen, Roar; Førde, Olav Helge

    2004-01-01

    Background The knowledge of factors accurately predicting the long lasting sick leaves is sparse, but information on medical condition is believed to be necessary to identify persons at risk. Based on the current practice, with identifying sick-listed individuals at risk of long-lasting sick leaves, the objectives of this study were to inquire the diagnostic accuracy of length of sick leaves predicted in the Norwegian National Insurance Offices, and to compare their predictions with the self-predictions of the sick-listed. Methods Based on medical certificates, two National Insurance medical consultants and two National Insurance officers predicted, at day 14, the length of sick leave in 993 consecutive cases of sick leave, resulting from musculoskeletal or mental disorders, in this 1-year follow-up study. Two months later they reassessed 322 cases based on extended medical certificates. Self-predictions were obtained in 152 sick-listed subjects when their sick leave passed 14 days. Diagnostic accuracy of the predictions was analysed by ROC area, sensitivity, specificity, likelihood ratio, and positive predictive value was included in the analyses of predictive validity. Results The sick-listed identified sick leave lasting 12 weeks or longer with an ROC area of 80.9% (95% CI 73.7–86.8), while the corresponding estimates for medical consultants and officers had ROC areas of 55.6% (95% CI 45.6–65.6%) and 56.0% (95% CI 46.6–65.4%), respectively. The predictions of sick-listed males were significantly better than those of female subjects, and older subjects predicted somewhat better than younger subjects. Neither formal medical competence, nor additional medical information, noticeably improved the diagnostic accuracy based on medical certificates. Conclusion This study demonstrates that the accuracy of a prognosis based on medical documentation in sickness absence forms, is lower than that of one based on direct communication with the sick-listed themselves

  19. Accurate and efficient prediction of fine-resolution hydrologic and carbon dynamic simulations from coarse-resolution models

    NASA Astrophysics Data System (ADS)

    Pau, George Shu Heng; Shen, Chaopeng; Riley, William J.; Liu, Yaning

    2016-02-01

    The topography, and the biotic and abiotic parameters are typically upscaled to make watershed-scale hydrologic-biogeochemical models computationally tractable. However, upscaling procedure can produce biases when nonlinear interactions between different processes are not fully captured at coarse resolutions. Here we applied the Proper Orthogonal Decomposition Mapping Method (PODMM) to downscale the field solutions from a coarse (7 km) resolution grid to a fine (220 m) resolution grid. PODMM trains a reduced-order model (ROM) with coarse-resolution and fine-resolution solutions, here obtained using PAWS+CLM, a quasi-3-D watershed processes model that has been validated for many temperate watersheds. Subsequent fine-resolution solutions were approximated based only on coarse-resolution solutions and the ROM. The approximation errors were efficiently quantified using an error estimator. By jointly estimating correlated variables and temporally varying the ROM parameters, we further reduced the approximation errors by up to 20%. We also improved the method's robustness by constructing multiple ROMs using different set of variables, and selecting the best approximation based on the error estimator. The ROMs produced accurate downscaling of soil moisture, latent heat flux, and net primary production with O(1000) reduction in computational cost. The subgrid distributions were also nearly indistinguishable from the ones obtained using the fine-resolution model. Compared to coarse-resolution solutions, biases in upscaled ROM solutions were reduced by up to 80%. This method has the potential to help address the long-standing spatial scaling problem in hydrology and enable long-time integration, parameter estimation, and stochastic uncertainty analysis while accurately representing the heterogeneities.

  20. Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy

    NASA Astrophysics Data System (ADS)

    Wang, Y.; Barth, P.

    2015-05-01

    How specific protein associations regulate the function of membrane receptors remains poorly understood. Conformational flexibility currently hinders the structure determination of several classes of membrane receptors and associated oligomers. Here we develop EFDOCK-TM, a general method to predict self-associated transmembrane protein helical (TMH) structures from sequence guided by co-evolutionary information. We show that accurate intermolecular contacts can be identified using a combination of protein sequence covariation and TMH binding surfaces predicted from sequence. When applied to diverse TMH oligomers, including receptors characterized in multiple conformational and functional states, the method reaches unprecedented near-atomic accuracy for most targets. Blind predictions of structurally uncharacterized receptor tyrosine kinase TMH oligomers provide a plausible hypothesis on the molecular mechanisms of disease-associated point mutations and binding surfaces for the rational design of selective inhibitors. The method sets the stage for uncovering novel determinants of molecular recognition and signalling in single-spanning eukaryotic membrane receptors.

  1. Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy

    PubMed Central

    Wang, Y.; Barth, P.

    2016-01-01

    How specific protein associations regulate the function of membrane receptors remains poorly understood. Conformational flexibility currently hinders the structure determination of several classes of membrane receptors and associated oligomers. Here we develop EFDOCK-TM, a general method to predict self-associated transmembrane protein helical (TMH) structures from sequence guided by co-evolutionary information. We show that accurate intermolecular contacts can be identified using a combination of protein sequence covariation and TMH binding surfaces predicted from sequence. When applied to diverse TMH oligomers, including receptors characterized in multiple conformational and functional states, the method reaches unprecedented near-atomic accuracy for most targets. Blind predictions of structurally uncharacterized receptor tyrosine kinase TMH oligomers provide a plausible hypothesis on the molecular mechanisms of disease-associated point mutations and binding surfaces for the rational design of selective inhibitors. The method sets the stage for uncovering novel determinants of molecular recognition and signalling in single-spanning eukaryotic membrane receptors. PMID:25995083

  2. Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources

    PubMed Central

    Zahiri, Javad; Bozorgmehr, Joseph Hannon; Masoudi-Nejad, Ali

    2013-01-01

    Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability. PMID:24396273

  3. Exploration of the dynamic properties of protein complexes predicted from spatially constrained protein-protein interaction networks.

    PubMed

    Yen, Eric A; Tsay, Aaron; Waldispuhl, Jerome; Vogel, Jackie

    2014-05-01

    Protein complexes are not static, but rather highly dynamic with subunits that undergo 1-dimensional diffusion with respect to each other. Interactions within protein complexes are modulated through regulatory inputs that alter interactions and introduce new components and deplete existing components through exchange. While it is clear that the structure and function of any given protein complex is coupled to its dynamical properties, it remains a challenge to predict the possible conformations that complexes can adopt. Protein-fragment Complementation Assays detect physical interactions between protein pairs constrained to ≤8 nm from each other in living cells. This method has been used to build networks composed of 1000s of pair-wise interactions. Significantly, these networks contain a wealth of dynamic information, as the assay is fully reversible and the proteins are expressed in their natural context. In this study, we describe a method that extracts this valuable information in the form of predicted conformations, allowing the user to explore the conformational landscape, to search for structures that correlate with an activity state, and estimate the abundance of conformations in the living cell. The generator is based on a Markov Chain Monte Carlo simulation that uses the interaction dataset as input and is constrained by the physical resolution of the assay. We applied this method to an 18-member protein complex composed of the seven core proteins of the budding yeast Arp2/3 complex and 11 associated regulators and effector proteins. We generated 20,480 output structures and identified conformational states using principle component analysis. We interrogated the conformation landscape and found evidence of symmetry breaking, a mixture of likely active and inactive conformational states and dynamic exchange of the core protein Arc15 between core and regulatory components. Our method provides a novel tool for prediction and visualization of the hidden

  4. Prognostic models and risk scores: can we accurately predict postoperative nausea and vomiting in children after craniotomy?

    PubMed

    Neufeld, Susan M; Newburn-Cook, Christine V; Drummond, Jane E

    2008-10-01

    Postoperative nausea and vomiting (PONV) is a problem for many children after craniotomy. Prognostic models and risk scores help identify who is at risk for an adverse event such as PONV to help guide clinical care. The purpose of this article is to assess whether an existing prognostic model or risk score can predict PONV in children after craniotomy. The concepts of transportability, calibration, and discrimination are presented to identify what is required to have a valid tool for clinical use. Although previous work may inform clinical practice and guide future research, existing prognostic models and risk scores do not appear to be options for predicting PONV in children undergoing craniotomy. However, until risk factors are further delineated, followed by the development and validation of prognostic models and risk scores that include children after craniotomy, clinical judgment in the context of current research may serve as a guide for clinical care in this population. PMID:18939320

  5. How accurately can subject-specific finite element models predict strains and strength of human femora? Investigation using full-field measurements.

    PubMed

    Grassi, Lorenzo; Väänänen, Sami P; Ristinmaa, Matti; Jurvelin, Jukka S; Isaksson, Hanna

    2016-03-21

    Subject-specific finite element models have been proposed as a tool to improve fracture risk assessment in individuals. A thorough laboratory validation against experimental data is required before introducing such models in clinical practice. Results from digital image correlation can provide full-field strain distribution over the specimen surface during in vitro test, instead of at a few pre-defined locations as with strain gauges. The aim of this study was to validate finite element models of human femora against experimental data from three cadaver femora, both in terms of femoral strength and of the full-field strain distribution collected with digital image correlation. The results showed a high accuracy between predicted and measured principal strains (R(2)=0.93, RMSE=10%, 1600 validated data points per specimen). Femoral strength was predicted using a rate dependent material model with specific strain limit values for yield and failure. This provided an accurate prediction (<2% error) for two out of three specimens. In the third specimen, an accidental change in the boundary conditions occurred during the experiment, which compromised the femoral strength validation. The achieved strain accuracy was comparable to that obtained in state-of-the-art studies which validated their prediction accuracy against 10-16 strain gauge measurements. Fracture force was accurately predicted, with the predicted failure location being very close to the experimental fracture rim. Despite the low sample size and the single loading condition tested, the present combined numerical-experimental method showed that finite element models can predict femoral strength by providing a thorough description of the local bone mechanical response. PMID:26944687

  6. An Optimized Method for Accurate Fetal Sex Prediction and Sex Chromosome Aneuploidy Detection in Non-Invasive Prenatal Testing.

    PubMed

    Wang, Ting; He, Quanze; Li, Haibo; Ding, Jie; Wen, Ping; Zhang, Qin; Xiang, Jingjing; Li, Qiong; Xuan, Liming; Kong, Lingyin; Mao, Yan; Zhu, Yijun; Shen, Jingjing; Liang, Bo; Li, Hong

    2016-01-01

    Massively parallel sequencing (MPS) combined with bioinformatic analysis has been widely applied to detect fetal chromosomal aneuploidies such as trisomy 21, 18, 13 and sex chromosome aneuploidies (SCAs) by sequencing cell-free fetal DNA (cffDNA) from maternal plasma, so-called non-invasive prenatal testing (NIPT). However, many technical challenges, such as dependency on correct fetal sex prediction, large variations of chromosome Y measurement and high sensitivity to random reads mapping, may result in higher false negative rate (FNR) and false positive rate (FPR) in fetal sex prediction as well as in SCAs detection. Here, we developed an optimized method to improve the accuracy of the current method by filtering out randomly mapped reads in six specific regions of the Y chromosome. The method reduces the FNR and FPR of fetal sex prediction from nearly 1% to 0.01% and 0.06%, respectively and works robustly under conditions of low fetal DNA concentration (1%) in testing and simulation of 92 samples. The optimized method was further confirmed by large scale testing (1590 samples), suggesting that it is reliable and robust enough for clinical testing.

  7. Coronary Computed Tomographic Angiography Does Not Accurately Predict the Need of Coronary Revascularization in Patients with Stable Angina

    PubMed Central

    Hong, Sung-Jin; Her, Ae-Young; Suh, Yongsung; Won, Hoyoun; Cho, Deok-Kyu; Cho, Yun-Hyeong; Yoon, Young-Won; Lee, Kyounghoon; Kang, Woong Chol; Kim, Yong Hoon; Kim, Sang-Wook; Shin, Dong-Ho; Kim, Jung-Sun; Kim, Byeong-Keuk; Ko, Young-Guk; Choi, Byoung-Wook; Choi, Donghoon; Jang, Yangsoo

    2016-01-01

    Purpose To evaluate the ability of coronary computed tomographic angiography (CCTA) to predict the need of coronary revascularization in symptomatic patients with stable angina who were referred to a cardiac catheterization laboratory for coronary revascularization. Materials and Methods Pre-angiography CCTA findings were analyzed in 1846 consecutive symptomatic patients with stable angina, who were referred to a cardiac catheterization laboratory at six hospitals and were potential candidates for coronary revascularization between July 2011 and December 2013. The number of patients requiring revascularization was determined based on the severity of coronary stenosis as assessed by CCTA. This was compared to the actual number of revascularization procedures performed in the cardiac catheterization laboratory. Results Based on CCTA findings, coronary revascularization was indicated in 877 (48%) and not indicated in 969 (52%) patients. Of the 877 patients indicated for revascularization by CCTA, only 600 (68%) underwent the procedure, whereas 285 (29%) of the 969 patients not indicated for revascularization, as assessed by CCTA, underwent the procedure. When the coronary arteries were divided into 15 segments using the American Heart Association coronary tree model, the sensitivity, specificity, positive predictive value, and negative predictive value of CCTA for therapeutic decision making on a per-segment analysis were 42%, 96%, 40%, and 96%, respectively. Conclusion CCTA-based assessment of coronary stenosis severity does not sufficiently differentiate between coronary segments requiring revascularization versus those not requiring revascularization. Conventional coronary angiography should be considered to determine the need of revascularization in symptomatic patients with stable angina. PMID:27401637

  8. An Optimized Method for Accurate Fetal Sex Prediction and Sex Chromosome Aneuploidy Detection in Non-Invasive Prenatal Testing.

    PubMed

    Wang, Ting; He, Quanze; Li, Haibo; Ding, Jie; Wen, Ping; Zhang, Qin; Xiang, Jingjing; Li, Qiong; Xuan, Liming; Kong, Lingyin; Mao, Yan; Zhu, Yijun; Shen, Jingjing; Liang, Bo; Li, Hong

    2016-01-01

    Massively parallel sequencing (MPS) combined with bioinformatic analysis has been widely applied to detect fetal chromosomal aneuploidies such as trisomy 21, 18, 13 and sex chromosome aneuploidies (SCAs) by sequencing cell-free fetal DNA (cffDNA) from maternal plasma, so-called non-invasive prenatal testing (NIPT). However, many technical challenges, such as dependency on correct fetal sex prediction, large variations of chromosome Y measurement and high sensitivity to random reads mapping, may result in higher false negative rate (FNR) and false positive rate (FPR) in fetal sex prediction as well as in SCAs detection. Here, we developed an optimized method to improve the accuracy of the current method by filtering out randomly mapped reads in six specific regions of the Y chromosome. The method reduces the FNR and FPR of fetal sex prediction from nearly 1% to 0.01% and 0.06%, respectively and works robustly under conditions of low fetal DNA concentration (1%) in testing and simulation of 92 samples. The optimized method was further confirmed by large scale testing (1590 samples), suggesting that it is reliable and robust enough for clinical testing. PMID:27441628

  9. An Optimized Method for Accurate Fetal Sex Prediction and Sex Chromosome Aneuploidy Detection in Non-Invasive Prenatal Testing

    PubMed Central

    Li, Haibo; Ding, Jie; Wen, Ping; Zhang, Qin; Xiang, Jingjing; Li, Qiong; Xuan, Liming; Kong, Lingyin; Mao, Yan; Zhu, Yijun; Shen, Jingjing; Liang, Bo; Li, Hong

    2016-01-01

    Massively parallel sequencing (MPS) combined with bioinformatic analysis has been widely applied to detect fetal chromosomal aneuploidies such as trisomy 21, 18, 13 and sex chromosome aneuploidies (SCAs) by sequencing cell-free fetal DNA (cffDNA) from maternal plasma, so-called non-invasive prenatal testing (NIPT). However, many technical challenges, such as dependency on correct fetal sex prediction, large variations of chromosome Y measurement and high sensitivity to random reads mapping, may result in higher false negative rate (FNR) and false positive rate (FPR) in fetal sex prediction as well as in SCAs detection. Here, we developed an optimized method to improve the accuracy of the current method by filtering out randomly mapped reads in six specific regions of the Y chromosome. The method reduces the FNR and FPR of fetal sex prediction from nearly 1% to 0.01% and 0.06%, respectively and works robustly under conditions of low fetal DNA concentration (1%) in testing and simulation of 92 samples. The optimized method was further confirmed by large scale testing (1590 samples), suggesting that it is reliable and robust enough for clinical testing. PMID:27441628

  10. Protein subcellular localization prediction based on compartment-specific features and structure conservation

    PubMed Central

    Su, Emily Chia-Yu; Chiu, Hua-Sheng; Lo, Allan; Hwang, Jenn-Kang; Sung, Ting-Yi; Hsu, Wen-Lian

    2007-01-01

    Background Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. Results We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. Conclusion Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant

  11. Prediction of coordination number and relative solvent accessibility in proteins.

    PubMed

    Pollastri, Gianluca; Baldi, Pierre; Fariselli, Pietro; Casadio, Rita

    2002-05-01

    Knowing the coordination number