Science.gov

Sample records for accurately predict protein

  1. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    PubMed Central

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  2. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions.

    PubMed

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-07-07

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.

  3. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics

    PubMed Central

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  4. Accurate Prediction of One-Dimensional Protein Structure Features Using SPINE-X.

    PubMed

    Faraggi, Eshel; Kloczkowski, Andrzej

    2017-01-01

    Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20(∘) and 33(∘), respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .

  5. Hash: a Program to Accurately Predict Protein Hα Shifts from Neighboring Backbone Shifts3

    PubMed Central

    Zeng, Jianyang; Zhou, Pei; Donald, Bruce Randall

    2012-01-01

    Chemical shifts provide not only peak identities for analyzing NMR data, but also an important source of conformational information for studying protein structures. Current structural studies requiring Hα chemical shifts suffer from the following limitations. (1) For large proteins, the Hα chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of Cα that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict Hα chemical shifts. Predicting accurate Hα chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called Hash, to predict Hα chemical shifts. Hash combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate Hα chemical shifts. Our testing results on different possible combinations of input data indicate that Hash has a wide rage of potential NMR applications in structural and biological studies of proteins. PMID:23242797

  6. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    PubMed Central

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-01-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. PMID:25979264

  7. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    SciTech Connect

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  8. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGES

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  9. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact

  10. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein

  11. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    NASA Astrophysics Data System (ADS)

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-02-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process.

  12. Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation

    PubMed Central

    Garcia Lopez, Sebastian; Kim, Philip M.

    2014-01-01

    Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403

  13. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    PubMed

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.

  14. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

    PubMed Central

    Victora, Andrea; Möller, Heiko M.; Exner, Thomas E.

    2014-01-01

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  15. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach

    PubMed Central

    Wang, Zhiheng; Yang, Qianqian; Li, Tonghua; Cong, Peisheng

    2015-01-01

    The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database. Availability The DisoMCS is available at http://cal.tongji.edu.cn/disorder/. PMID:26090958

  16. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.

    PubMed

    Baldassi, Carlo; Zamparo, Marco; Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea

    2014-01-01

    In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.

  17. Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding

    PubMed Central

    Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.

    2016-01-01

    The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally—a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592

  18. Protein corona composition does not accurately predict hematocompatibility of colloidal gold nanoparticles.

    PubMed

    Dobrovolskaia, Marina A; Neun, Barry W; Man, Sonny; Ye, Xiaoying; Hansen, Matthew; Patri, Anil K; Crist, Rachael M; McNeil, Scott E

    2014-10-01

    Proteins bound to nanoparticle surfaces are known to affect particle clearance by influencing immune cell uptake and distribution to the organs of the mononuclear phagocytic system. The composition of the protein corona has been described for several types of nanomaterials, but the role of the corona in nanoparticle biocompatibility is not well established. In this study we investigate the role of nanoparticle surface properties (PEGylation) and incubation times on the protein coronas of colloidal gold nanoparticles. While neither incubation time nor PEG molecular weight affected the specific proteins in the protein corona, the total amount of protein binding was governed by the molecular weight of PEG coating. Furthermore, the composition of the protein corona did not correlate with nanoparticle hematocompatibility. Specialized hematological tests should be used to deduce nanoparticle hematotoxicity. From the clinical editor: It is overall unclear how the protein corona associated with colloidal gold nanoparticles may influence hematotoxicity. This study warns that PEGylation itself may be insufficient, because composition of the protein corona does not directly correlate with nanoparticle hematocompatibility. The authors suggest that specialized hematological tests must be used to deduce nanoparticle hematotoxicity.

  19. Network Biomarkers Constructed from Gene Expression and Protein-Protein Interaction Data for Accurate Prediction of Leukemia

    PubMed Central

    Yuan, Xuye; Chen, Jiajia; Lin, Yuxin; Li, Yin; Xu, Lihua; Chen, Luonan; Hua, Haiying; Shen, Bairong

    2017-01-01

    Leukemia is a leading cause of cancer deaths in the developed countries. Great efforts have been undertaken in search of diagnostic biomarkers of leukemia. However, leukemia is highly complex and heterogeneous, involving interaction among multiple molecular components. Individual molecules are not necessarily sensitive diagnostic indicators. Network biomarkers are considered to outperform individual molecules in disease characterization. We applied an integrative approach that identifies active network modules as putative biomarkers for leukemia diagnosis. We first reconstructed the leukemia-specific PPI network using protein-protein interactions from the Protein Interaction Network Analysis (PINA) and protein annotations from GeneGo. The network was further integrated with gene expression profiles to identify active modules with leukemia relevance. Finally, the candidate network-based biomarker was evaluated for the diagnosing performance. A network of 97 genes and 400 interactions was identified for accurate diagnosis of leukemia. Functional enrichment analysis revealed that the network biomarkers were enriched in pathways in cancer. The network biomarkers could discriminate leukemia samples from the normal controls more effectively than the known biomarkers. The network biomarkers provide a useful tool to diagnose leukemia and also aids in further understanding the molecular basis of leukemia. PMID:28243332

  20. Molecular Dynamics in Mixed Solvents Reveals Protein-Ligand Interactions, Improves Docking, and Allows Accurate Binding Free Energy Predictions.

    PubMed

    Arcon, Juan Pablo; Defelipe, Lucas A; Modenutti, Carlos P; López, Elias D; Alvarez-Garcia, Daniel; Barril, Xavier; Turjanski, Adrián G; Martí, Marcelo A

    2017-03-31

    One of the most important biological processes at the molecular level is the formation of protein-ligand complexes. Therefore, determining their structure and underlying key interactions is of paramount relevance and has direct applications in drug development. Because of its low cost relative to its experimental sibling, molecular dynamics (MD) simulations in the presence of different solvent probes mimicking specific types of interactions have been increasingly used to analyze protein binding sites and reveal protein-ligand interaction hot spots. However, a systematic comparison of different probes and their real predictive power from a quantitative and thermodynamic point of view is still missing. In the present work, we have performed MD simulations of 18 different proteins in pure water as well as water mixtures of ethanol, acetamide, acetonitrile and methylammonium acetate, leading to a total of 5.4 μs simulation time. For each system, we determined the corresponding solvent sites, defined as space regions adjacent to the protein surface where the probability of finding a probe atom is higher than that in the bulk solvent. Finally, we compared the identified solvent sites with 121 different protein-ligand complexes and used them to perform molecular docking and ligand binding free energy estimates. Our results show that combining solely water and ethanol sites allows sampling over 70% of all possible protein-ligand interactions, especially those that coincide with ligand-based pharmacophoric points. Most important, we also show how the solvent sites can be used to significantly improve ligand docking in terms of both accuracy and precision, and that accurate predictions of ligand binding free energies, along with relative ranking of ligand affinity, can be performed.

  1. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

    PubMed

    Ballester, Pedro J; Schreyer, Adrian; Blundell, Tom L

    2014-03-24

    Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein-ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein-ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.

  2. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  3. Toward accurate prediction of pKa values for internal protein residues: the importance of conformational relaxation and desolvation energy.

    PubMed

    Wallace, Jason A; Wang, Yuhang; Shi, Chuanyin; Pastoor, Kevin J; Nguyen, Bao-Linh; Xia, Kai; Shen, Jana K

    2011-12-01

    Proton uptake or release controls many important biological processes, such as energy transduction, virus replication, and catalysis. Accurate pK(a) prediction informs about proton pathways, thereby revealing detailed acid-base mechanisms. Physics-based methods in the framework of molecular dynamics simulations not only offer pK(a) predictions but also inform about the physical origins of pK(a) shifts and provide details of ionization-induced conformational relaxation and large-scale transitions. One such method is the recently developed continuous constant pH molecular dynamics (CPHMD) method, which has been shown to be an accurate and robust pK(a) prediction tool for naturally occurring titratable residues. To further examine the accuracy and limitations of CPHMD, we blindly predicted the pK(a) values for 87 titratable residues introduced in various hydrophobic regions of staphylococcal nuclease and variants. The predictions gave a root-mean-square deviation of 1.69 pK units from experiment, and there were only two pK(a)'s with errors greater than 3.5 pK units. Analysis of the conformational fluctuation of titrating side-chains in the context of the errors of calculated pK(a) values indicate that explicit treatment of conformational flexibility and the associated dielectric relaxation gives CPHMD a distinct advantage. Analysis of the sources of errors suggests that more accurate pK(a) predictions can be obtained for the most deeply buried residues by improving the accuracy in calculating desolvation energies. Furthermore, it is found that the generalized Born implicit-solvent model underlying the current CPHMD implementation slightly distorts the local conformational environment such that the inclusion of an explicit-solvent representation may offer improvement of accuracy.

  4. Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining

    PubMed Central

    Karwath, Andreas; Clare, Amanda; Dehaspe, Luc

    2000-01-01

    The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli. PMID:11119305

  5. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

    PubMed

    Li, Xiaowei; Liu, Taigang; Tao, Peiying; Wang, Chunhua; Chen, Lanming

    2015-12-01

    Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.

  6. Microdosing of a Carbon-14 Labeled Protein in Healthy Volunteers Accurately Predicts Its Pharmacokinetics at Therapeutic Dosages.

    PubMed

    Vlaming, M L H; van Duijn, E; Dillingh, M R; Brands, R; Windhorst, A D; Hendrikse, N H; Bosgra, S; Burggraaf, J; de Koning, M C; Fidder, A; Mocking, J A J; Sandman, H; de Ligt, R A F; Fabriek, B O; Pasman, W J; Seinen, W; Alves, T; Carrondo, M; Peixoto, C; Peeters, P A M; Vaes, W H J

    2015-08-01

    Preclinical development of new biological entities (NBEs), such as human protein therapeutics, requires considerable expenditure of time and costs. Poor prediction of pharmacokinetics in humans further reduces net efficiency. In this study, we show for the first time that pharmacokinetic data of NBEs in humans can be successfully obtained early in the drug development process by the use of microdosing in a small group of healthy subjects combined with ultrasensitive accelerator mass spectrometry (AMS). After only minimal preclinical testing, we performed a first-in-human phase 0/phase 1 trial with a human recombinant therapeutic protein (RESCuing Alkaline Phosphatase, human recombinant placental alkaline phosphatase [hRESCAP]) to assess its safety and kinetics. Pharmacokinetic analysis showed dose linearity from microdose (53 μg) [(14) C]-hRESCAP to therapeutic doses (up to 5.3 mg) of the protein in healthy volunteers. This study demonstrates the value of a microdosing approach in a very small cohort for accelerating the clinical development of NBEs.

  7. New model accurately predicts reformate composition

    SciTech Connect

    Ancheyta-Juarez, J.; Aguilar-Rodriguez, E. )

    1994-01-31

    Although naphtha reforming is a well-known process, the evolution of catalyst formulation, as well as new trends in gasoline specifications, have led to rapid evolution of the process, including: reactor design, regeneration mode, and operating conditions. Mathematical modeling of the reforming process is an increasingly important tool. It is fundamental to the proper design of new reactors and revamp of existing ones. Modeling can be used to optimize operating conditions, analyze the effects of process variables, and enhance unit performance. Instituto Mexicano del Petroleo has developed a model of the catalytic reforming process that accurately predicts reformate composition at the higher-severity conditions at which new reformers are being designed. The new AA model is more accurate than previous proposals because it takes into account the effects of temperature and pressure on the rate constants of each chemical reaction.

  8. A gene expression biomarker accurately predicts estrogen ...

    EPA Pesticide Factsheets

    The EPA’s vision for the Endocrine Disruptor Screening Program (EDSP) in the 21st Century (EDSP21) includes utilization of high-throughput screening (HTS) assays coupled with computational modeling to prioritize chemicals with the goal of eventually replacing current Tier 1 screening tests. The ToxCast program currently includes 18 HTS in vitro assays that evaluate the ability of chemicals to modulate estrogen receptor α (ERα), an important endocrine target. We propose microarray-based gene expression profiling as a complementary approach to predict ERα modulation and have developed computational methods to identify ERα modulators in an existing database of whole-genome microarray data. The ERα biomarker consisted of 46 ERα-regulated genes with consistent expression patterns across 7 known ER agonists and 3 known ER antagonists. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression data sets from experiments in MCF-7 cells. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% or 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) OECD ER reference chemicals including “very weak” agonists and replicated predictions based on 18 in vitro ER-associated HTS assays. For 114 chemicals present in both the HTS data and the MCF-7 c

  9. You Can Accurately Predict Land Acquisition Costs.

    ERIC Educational Resources Information Center

    Garrigan, Richard

    1967-01-01

    Land acquisition costs were tested for predictability based upon the 1962 assessed valuations of privately held land acquired for campus expansion by the University of Wisconsin from 1963-1965. By correlating the land acquisition costs of 108 properties acquired during the 3 year period with--(1) the assessed value of the land, (2) the assessed…

  10. Towards more accurate vegetation mortality predictions

    DOE PAGES

    Sevanto, Sanna Annika; Xu, Chonggang

    2016-09-26

    Predicting the fate of vegetation under changing climate is one of the major challenges of the climate modeling community. Here, terrestrial vegetation dominates the carbon and water cycles over land areas, and dramatic changes in vegetation cover resulting from stressful environmental conditions such as drought feed directly back to local and regional climate, potentially leading to a vicious cycle where vegetation recovery after a disturbance is delayed or impossible.

  11. A predictable and accurate technique with elastomeric impression materials.

    PubMed

    Barghi, N; Ontiveros, J C

    1999-08-01

    A method for obtaining more predictable and accurate final impressions with polyvinylsiloxane impression materials in conjunction with stock trays is proposed and tested. Heavy impression material is used in advance for construction of a modified custom tray, while extra-light material is used for obtaining a more accurate final impression.

  12. Accurate torque-speed performance prediction for brushless dc motors

    NASA Astrophysics Data System (ADS)

    Gipper, Patrick D.

    Desirable characteristics of the brushless dc motor (BLDCM) have resulted in their application for electrohydrostatic (EH) and electromechanical (EM) actuation systems. But to effectively apply the BLDCM requires accurate prediction of performance. The minimum necessary performance characteristics are motor torque versus speed, peak and average supply current and efficiency. BLDCM nonlinear simulation software specifically adapted for torque-speed prediction is presented. The capability of the software to quickly and accurately predict performance has been verified on fractional to integral HP motor sizes, and is presented. Additionally, the capability of torque-speed prediction with commutation angle advance is demonstrated.

  13. Mouse models of human AML accurately predict chemotherapy response

    PubMed Central

    Zuber, Johannes; Radtke, Ina; Pardee, Timothy S.; Zhao, Zhen; Rappaport, Amy R.; Luo, Weijun; McCurrach, Mila E.; Yang, Miao-Miao; Dolan, M. Eileen; Kogan, Scott C.; Downing, James R.; Lowe, Scott W.

    2009-01-01

    The genetic heterogeneity of cancer influences the trajectory of tumor progression and may underlie clinical variation in therapy response. To model such heterogeneity, we produced genetically and pathologically accurate mouse models of common forms of human acute myeloid leukemia (AML) and developed methods to mimic standard induction chemotherapy and efficiently monitor therapy response. We see that murine AMLs harboring two common human AML genotypes show remarkably diverse responses to conventional therapy that mirror clinical experience. Specifically, murine leukemias expressing the AML1/ETO fusion oncoprotein, associated with a favorable prognosis in patients, show a dramatic response to induction chemotherapy owing to robust activation of the p53 tumor suppressor network. Conversely, murine leukemias expressing MLL fusion proteins, associated with a dismal prognosis in patients, are drug-resistant due to an attenuated p53 response. Our studies highlight the importance of genetic information in guiding the treatment of human AML, functionally establish the p53 network as a central determinant of chemotherapy response in AML, and demonstrate that genetically engineered mouse models of human cancer can accurately predict therapy response in patients. PMID:19339691

  14. Mouse models of human AML accurately predict chemotherapy response.

    PubMed

    Zuber, Johannes; Radtke, Ina; Pardee, Timothy S; Zhao, Zhen; Rappaport, Amy R; Luo, Weijun; McCurrach, Mila E; Yang, Miao-Miao; Dolan, M Eileen; Kogan, Scott C; Downing, James R; Lowe, Scott W

    2009-04-01

    The genetic heterogeneity of cancer influences the trajectory of tumor progression and may underlie clinical variation in therapy response. To model such heterogeneity, we produced genetically and pathologically accurate mouse models of common forms of human acute myeloid leukemia (AML) and developed methods to mimic standard induction chemotherapy and efficiently monitor therapy response. We see that murine AMLs harboring two common human AML genotypes show remarkably diverse responses to conventional therapy that mirror clinical experience. Specifically, murine leukemias expressing the AML1/ETO fusion oncoprotein, associated with a favorable prognosis in patients, show a dramatic response to induction chemotherapy owing to robust activation of the p53 tumor suppressor network. Conversely, murine leukemias expressing MLL fusion proteins, associated with a dismal prognosis in patients, are drug-resistant due to an attenuated p53 response. Our studies highlight the importance of genetic information in guiding the treatment of human AML, functionally establish the p53 network as a central determinant of chemotherapy response in AML, and demonstrate that genetically engineered mouse models of human cancer can accurately predict therapy response in patients.

  15. On the Accurate Prediction of CME Arrival At the Earth

    NASA Astrophysics Data System (ADS)

    Zhang, Jie; Hess, Phillip

    2016-07-01

    We will discuss relevant issues regarding the accurate prediction of CME arrival at the Earth, from both observational and theoretical points of view. In particular, we clarify the importance of separating the study of CME ejecta from the ejecta-driven shock in interplanetary CMEs (ICMEs). For a number of CME-ICME events well observed by SOHO/LASCO, STEREO-A and STEREO-B, we carry out the 3-D measurements by superimposing geometries onto both the ejecta and sheath separately. These measurements are then used to constrain a Drag-Based Model, which is improved through a modification of including height dependence of the drag coefficient into the model. Combining all these factors allows us to create predictions for both fronts at 1 AU and compare with actual in-situ observations. We show an ability to predict the sheath arrival with an average error of under 4 hours, with an RMS error of about 1.5 hours. For the CME ejecta, the error is less than two hours with an RMS error within an hour. Through using the best observations of CMEs, we show the power of our method in accurately predicting CME arrival times. The limitation and implications of our accurate prediction method will be discussed.

  16. Fast and accurate automatic structure prediction with HHpred.

    PubMed

    Hildebrand, Andrea; Remmert, Michael; Biegert, Andreas; Söding, Johannes

    2009-01-01

    Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models for an increasing number of targets by reliably detecting and aligning more and more remotely homologous templates. Here, we describe the three fully automated versions of the HHpred server that participated in the community-wide blind protein structure prediction competition CASP8. What makes HHpred unique is the combination of usability, short response times (typically under 15 min) and a model accuracy that is competitive with those of the best servers in CASP8.

  17. Practical lessons from protein structure prediction

    PubMed Central

    Ginalski, Krzysztof; Grishin, Nick V.; Godzik, Adam; Rychlewski, Leszek

    2005-01-01

    Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed. PMID:15805122

  18. Combining heterogeneous data sources for accurate functional annotation of proteins

    PubMed Central

    2013-01-01

    Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net. PMID:23514123

  19. Passive samplers accurately predict PAH levels in resident crayfish.

    PubMed

    Paulik, L Blair; Smith, Brian W; Bergmann, Alan J; Sower, Greg J; Forsberg, Norman D; Teeguarden, Justin G; Anderson, Kim A

    2016-02-15

    Contamination of resident aquatic organisms is a major concern for environmental risk assessors. However, collecting organisms to estimate risk is often prohibitively time and resource-intensive. Passive sampling accurately estimates resident organism contamination, and it saves time and resources. This study used low density polyethylene (LDPE) passive water samplers to predict polycyclic aromatic hydrocarbon (PAH) levels in signal crayfish, Pacifastacus leniusculus. Resident crayfish were collected at 5 sites within and outside of the Portland Harbor Superfund Megasite (PHSM) in the Willamette River in Portland, Oregon. LDPE deployment was spatially and temporally paired with crayfish collection. Crayfish visceral and tail tissue, as well as water-deployed LDPE, were extracted and analyzed for 62 PAHs using GC-MS/MS. Freely-dissolved concentrations (Cfree) of PAHs in water were calculated from concentrations in LDPE. Carcinogenic risks were estimated for all crayfish tissues, using benzo[a]pyrene equivalent concentrations (BaPeq). ∑PAH were 5-20 times higher in viscera than in tails, and ∑BaPeq were 6-70 times higher in viscera than in tails. Eating only tail tissue of crayfish would therefore significantly reduce carcinogenic risk compared to also eating viscera. Additionally, PAH levels in crayfish were compared to levels in crayfish collected 10 years earlier. PAH levels in crayfish were higher upriver of the PHSM and unchanged within the PHSM after the 10-year period. Finally, a linear regression model predicted levels of 34 PAHs in crayfish viscera with an associated R-squared value of 0.52 (and a correlation coefficient of 0.72), using only the Cfree PAHs in water. On average, the model predicted PAH concentrations in crayfish tissue within a factor of 2.4 ± 1.8 of measured concentrations. This affirms that passive water sampling accurately estimates PAH contamination in crayfish. Furthermore, the strong predictive ability of this simple model suggests

  20. Inverter Modeling For Accurate Energy Predictions Of Tracking HCPV Installations

    NASA Astrophysics Data System (ADS)

    Bowman, J.; Jensen, S.; McDonald, Mark

    2010-10-01

    High efficiency high concentration photovoltaic (HCPV) solar plants of megawatt scale are now operational, and opportunities for expanded adoption are plentiful. However, effective bidding for sites requires reliable prediction of energy production. HCPV module nameplate power is rated for specific test conditions; however, instantaneous HCPV power varies due to site specific irradiance and operating temperature, and is degraded by soiling, protective stowing, shading, and electrical connectivity. These factors interact with the selection of equipment typically supplied by third parties, e.g., wire gauge and inverters. We describe a time sequence model accurately accounting for these effects that predicts annual energy production, with specific reference to the impact of the inverter on energy output and interactions between system-level design decisions and the inverter. We will also show two examples, based on an actual field design, of inverter efficiency calculations and the interaction between string arrangements and inverter selection.

  1. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    SciTech Connect

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; Rose, Kristie L.; Tabb, David L.

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.

  2. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    DOE PAGES

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; ...

    2013-03-07

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less

  3. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

    PubMed Central

    Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; Holman, Jerry D.; Chen, Kan; Liebler, Daniel C.; Orton, Daniel J.; Purvine, Samuel O.; Monroe, Matthew E.; Chung, Chang Y.; Rose, Kristie L.; Tabb, David L.

    2013-01-01

    In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification. PMID:23499924

  4. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

    PubMed

    Budowski-Tal, Inbal; Nov, Yuval; Kolodny, Rachel

    2010-02-23

    Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

  5. Turbulence Models for Accurate Aerothermal Prediction in Hypersonic Flows

    NASA Astrophysics Data System (ADS)

    Zhang, Xiang-Hong; Wu, Yi-Zao; Wang, Jiang-Feng

    Accurate description of the aerodynamic and aerothermal environment is crucial to the integrated design and optimization for high performance hypersonic vehicles. In the simulation of aerothermal environment, the effect of viscosity is crucial. The turbulence modeling remains a major source of uncertainty in the computational prediction of aerodynamic forces and heating. In this paper, three turbulent models were studied: the one-equation eddy viscosity transport model of Spalart-Allmaras, the Wilcox k-ω model and the Menter SST model. For the k-ω model and SST model, the compressibility correction, press dilatation and low Reynolds number correction were considered. The influence of these corrections for flow properties were discussed by comparing with the results without corrections. In this paper the emphasis is on the assessment and evaluation of the turbulence models in prediction of heat transfer as applied to a range of hypersonic flows with comparison to experimental data. This will enable establishing factor of safety for the design of thermal protection systems of hypersonic vehicle.

  6. EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC

    NASA Astrophysics Data System (ADS)

    Chang, Tzu-Hao; Wu, Li-Ching; Lee, Tzong-Yi; Chen, Shu-Pin; Huang, Hsien-Da; Horng, Jorng-Tzong

    2013-01-01

    The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.

  7. EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC.

    PubMed

    Chang, Tzu-Hao; Wu, Li-Ching; Lee, Tzong-Yi; Chen, Shu-Pin; Huang, Hsien-Da; Horng, Jorng-Tzong

    2013-01-01

    The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.

  8. Simple Mathematical Models Do Not Accurately Predict Early SIV Dynamics

    PubMed Central

    Noecker, Cecilia; Schaefer, Krista; Zaccheo, Kelly; Yang, Yiding; Day, Judy; Ganusov, Vitaly V.

    2015-01-01

    Upon infection of a new host, human immunodeficiency virus (HIV) replicates in the mucosal tissues and is generally undetectable in circulation for 1–2 weeks post-infection. Several interventions against HIV including vaccines and antiretroviral prophylaxis target virus replication at this earliest stage of infection. Mathematical models have been used to understand how HIV spreads from mucosal tissues systemically and what impact vaccination and/or antiretroviral prophylaxis has on viral eradication. Because predictions of such models have been rarely compared to experimental data, it remains unclear which processes included in these models are critical for predicting early HIV dynamics. Here we modified the “standard” mathematical model of HIV infection to include two populations of infected cells: cells that are actively producing the virus and cells that are transitioning into virus production mode. We evaluated the effects of several poorly known parameters on infection outcomes in this model and compared model predictions to experimental data on infection of non-human primates with variable doses of simian immunodifficiency virus (SIV). First, we found that the mode of virus production by infected cells (budding vs. bursting) has a minimal impact on the early virus dynamics for a wide range of model parameters, as long as the parameters are constrained to provide the observed rate of SIV load increase in the blood of infected animals. Interestingly and in contrast with previous results, we found that the bursting mode of virus production generally results in a higher probability of viral extinction than the budding mode of virus production. Second, this mathematical model was not able to accurately describe the change in experimentally determined probability of host infection with increasing viral doses. Third and finally, the model was also unable to accurately explain the decline in the time to virus detection with increasing viral dose. These results

  9. Carbene footprinting accurately maps binding sites in protein-ligand and protein-protein interactions.

    PubMed

    Manzi, Lucio; Barrow, Andrew S; Scott, Daniel; Layfield, Robert; Wright, Timothy G; Moses, John E; Oldham, Neil J

    2016-11-16

    Specific interactions between proteins and their binding partners are fundamental to life processes. The ability to detect protein complexes, and map their sites of binding, is crucial to understanding basic biology at the molecular level. Methods that employ sensitive analytical techniques such as mass spectrometry have the potential to provide valuable insights with very little material and on short time scales. Here we present a differential protein footprinting technique employing an efficient photo-activated probe for use with mass spectrometry. Using this methodology the location of a carbohydrate substrate was accurately mapped to the binding cleft of lysozyme, and in a more complex example, the interactions between a 100 kDa, multi-domain deubiquitinating enzyme, USP5 and a diubiquitin substrate were located to different functional domains. The much improved properties of this probe make carbene footprinting a viable method for rapid and accurate identification of protein binding sites utilizing benign, near-UV photoactivation.

  10. Carbene footprinting accurately maps binding sites in protein-ligand and protein-protein interactions

    NASA Astrophysics Data System (ADS)

    Manzi, Lucio; Barrow, Andrew S.; Scott, Daniel; Layfield, Robert; Wright, Timothy G.; Moses, John E.; Oldham, Neil J.

    2016-11-01

    Specific interactions between proteins and their binding partners are fundamental to life processes. The ability to detect protein complexes, and map their sites of binding, is crucial to understanding basic biology at the molecular level. Methods that employ sensitive analytical techniques such as mass spectrometry have the potential to provide valuable insights with very little material and on short time scales. Here we present a differential protein footprinting technique employing an efficient photo-activated probe for use with mass spectrometry. Using this methodology the location of a carbohydrate substrate was accurately mapped to the binding cleft of lysozyme, and in a more complex example, the interactions between a 100 kDa, multi-domain deubiquitinating enzyme, USP5 and a diubiquitin substrate were located to different functional domains. The much improved properties of this probe make carbene footprinting a viable method for rapid and accurate identification of protein binding sites utilizing benign, near-UV photoactivation.

  11. Fast and accurate predictions of covalent bonds in chemical space

    NASA Astrophysics Data System (ADS)

    Chang, K. Y. Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O. Anatole

    2016-05-01

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (˜1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H 2+ . Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  12. Fast and accurate predictions of covalent bonds in chemical space.

    PubMed

    Chang, K Y Samuel; Fias, Stijn; Ramakrishnan, Raghunathan; von Lilienfeld, O Anatole

    2016-05-07

    We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated σ bonding to hydrogen, as well as σ and π bonding between main-group elements, occurring in small sets of iso-valence-electronic molecules with elements drawn from second to fourth rows in the p-block of the periodic table. Numerical evidence suggests that first order Taylor expansions of covalent bonding potentials can achieve high accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) it involves elements from the third and fourth rows of the periodic table, and (iii) an optimal reference geometry is used. This leads to near linear changes in the bonding potential, resulting in analytical predictions with chemical accuracy (∼1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation based second order perturbation theory performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H2 (+). Results are presented for (i) covalent bonds to hydrogen in 12 molecules with 8 valence electrons (CH4, NH3, H2O, HF, SiH4, PH3, H2S, HCl, GeH4, AsH3, H2Se, HBr); (ii) main-group single bonds in 9 molecules with 14 valence electrons (CH3F, CH3Cl, CH3Br, SiH3F, SiH3Cl, SiH3Br, GeH3F, GeH3Cl, GeH3Br); (iii) main-group double bonds in 9 molecules with 12 valence electrons (CH2O, CH2S, CH2Se, SiH2O, SiH2S, SiH2Se, GeH2O, GeH2S, GeH2Se); (iv) main-group triple bonds in 9 molecules with 10 valence electrons (HCN, HCP, HCAs, HSiN, HSi

  13. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.

  14. IRIS: Towards an Accurate and Fast Stage Weight Prediction Method

    NASA Astrophysics Data System (ADS)

    Taponier, V.; Balu, A.

    2002-01-01

    The knowledge of the structural mass fraction (or the mass ratio) of a given stage, which affects the performance of a rocket, is essential for the analysis of new or upgraded launchers or stages, whose need is increased by the quick evolution of the space programs and by the necessity of their adaptation to the market needs. The availability of this highly scattered variable, ranging between 0.05 and 0.15, is of primary importance at the early steps of the preliminary design studies. At the start of the staging and performance studies, the lack of frozen weight data (to be obtained later on from propulsion, trajectory and sizing studies) leads to rely on rough estimates, generally derived from printed sources and adapted. When needed, a consolidation can be acquired trough a specific analysis activity involving several techniques and implying additional effort and time. The present empirical approach allows thus to get approximated values (i.e. not necessarily accurate or consistent), inducing some result inaccuracy as well as, consequently, difficulties of performance ranking for a multiple option analysis, and an increase of the processing duration. This forms a classical harsh fact of the preliminary design system studies, insufficiently discussed to date. It appears therefore highly desirable to have, for all the evaluation activities, a reliable, fast and easy-to-use weight or mass fraction prediction method. Additionally, the latter should allow for a pre selection of the alternative preliminary configurations, making possible a global system approach. For that purpose, an attempt at modeling has been undertaken, whose objective was the determination of a parametric formulation of the mass fraction, to be expressed from a limited number of parameters available at the early steps of the project. It is based on the innovative use of a statistical method applicable to a variable as a function of several independent parameters. A specific polynomial generator

  15. Generating highly accurate prediction hypotheses through collaborative ensemble learning

    PubMed Central

    Arsov, Nino; Pavlovski, Martin; Basnarkov, Lasko; Kocarev, Ljupco

    2017-01-01

    Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance reduction, while the latter ameliorates overfitting, the outcome of a multi-model that combines both strives toward a comprehensive net-balancing of the bias-variance trade-off. To further improve this, we alter the bagged-boosting scheme by introducing collaboration between the multi-model’s constituent learners at various levels. This novel stability-guided classification scheme is delivered in two flavours: during or after the boosting process. Applied among a crowd of Gentle Boost ensembles, the ability of the two suggested algorithms to generalize is inspected by comparing them against Subbagging and Gentle Boost on various real-world datasets. In both cases, our models obtained a 40% generalization error decrease. But their true ability to capture details in data was revealed through their application for protein detection in texture analysis of gel electrophoresis images. They achieve improved performance of approximately 0.9773 AUROC when compared to the AUROC of 0.9574 obtained by an SVM based on recursive feature elimination. PMID:28304378

  16. Generating highly accurate prediction hypotheses through collaborative ensemble learning

    NASA Astrophysics Data System (ADS)

    Arsov, Nino; Pavlovski, Martin; Basnarkov, Lasko; Kocarev, Ljupco

    2017-03-01

    Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance reduction, while the latter ameliorates overfitting, the outcome of a multi-model that combines both strives toward a comprehensive net-balancing of the bias-variance trade-off. To further improve this, we alter the bagged-boosting scheme by introducing collaboration between the multi-model’s constituent learners at various levels. This novel stability-guided classification scheme is delivered in two flavours: during or after the boosting process. Applied among a crowd of Gentle Boost ensembles, the ability of the two suggested algorithms to generalize is inspected by comparing them against Subbagging and Gentle Boost on various real-world datasets. In both cases, our models obtained a 40% generalization error decrease. But their true ability to capture details in data was revealed through their application for protein detection in texture analysis of gel electrophoresis images. They achieve improved performance of approximately 0.9773 AUROC when compared to the AUROC of 0.9574 obtained by an SVM based on recursive feature elimination.

  17. Predicting conformational switches in proteins.

    PubMed Central

    Young, M.; Kirshenbaum, K.; Dill, K. A.; Highsmith, S.

    1999-01-01

    We describe a new computational technique to predict conformationally switching elements in proteins from their amino acid sequences. The method, called ASP (Ambivalent Structure Predictor), analyzes results from a secondary structure prediction algorithm to identify regions of conformational ambivalence. ASP identifies ambivalent regions in 16 test protein sequences for which function involves substantial backbone rearrangements. In the test set, all sites previously described as conformational switches are correctly predicted to be structurally ambivalent regions. No such regions are predicted in three negative control protein sequences. ASP may be useful as a guide for experimental studies on protein function and motion in the absence of detailed three-dimensional structural data. PMID:10493576

  18. Blind protein structure prediction using accelerated free-energy simulations

    PubMed Central

    Perez, Alberto; Morrone, Joseph A.; Brini, Emiliano; MacCallum, Justin L.; Dill, Ken A.

    2016-01-01

    We report a key proof of principle of a new acceleration method [Modeling Employing Limited Data (MELD)] for predicting protein structures by molecular dynamics simulation. It shows that such Boltzmann-satisfying techniques are now sufficiently fast and accurate to predict native protein structures in a limited test within the Critical Assessment of Structure Prediction (CASP) community-wide blind competition. PMID:27847872

  19. Protein Function Prediction: Problems and Pitfalls.

    PubMed

    Pearson, William R

    2015-09-03

    The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood.

  20. BPROMPT: A consensus server for membrane protein prediction.

    PubMed

    Taylor, Paul D; Attwood, Teresa K; Flower, Darren R

    2003-07-01

    Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.

  1. Prediction of Preoperative Anxiety in Children: Who is Most Accurate?

    PubMed Central

    MacLaren, Jill E.; Thompson, Caitlin; Weinberg, Megan; Fortier, Michelle A.; Morrison, Debra E.; Perret, Danielle; Kain, Zeev N.

    2009-01-01

    Background In this investigation, we sought to assess the ability of pediatric attending anesthesiologists, resident anesthesiologists and mothers to predict anxiety during induction of anesthesia in 2 to 16-year-old children (n=125). Methods Anesthesiologists and mothers provided predictions using a visual analog scale and children's anxiety was assessed using a valid behavior observation tool the Modified Yale Preoperative Anxiety Scale (mYPAS). All mothers were present during anesthetic induction and no child received sedative premedication. Correlational analyses were conducted. Results A total of 125 children aged 2 to 16 years, their mothers, and their attending pediatric anesthesiologists and resident anesthesiologists were studied. Correlational analyses revealed significant associations between attending predictions and child anxiety at induction (rs= 0.38, p<0.001). Resident anesthesiologist and mother predictions were not significantly related to children's anxiety during induction (rs = 0.01 and 0.001, respectively). In terms of accuracy of prediction, 47.2% of predictions made by attending anesthesiologists were within one standard deviation of the observed anxiety exhibited by the child, and 70.4% of predictions were within 2 standard deviations. Conclusions We conclude that attending anesthesiologists who practice in pediatric settings are better than mothers in predicting the anxiety of children during induction of anesthesia. While this finding has significant clinical implications, it is unclear if it can be extended to attending anesthesiologists whose practice is not mostly pediatric anesthesia. PMID:19448201

  2. Is Three-Dimensional Soft Tissue Prediction by Software Accurate?

    PubMed

    Nam, Ki-Uk; Hong, Jongrak

    2015-11-01

    The authors assessed whether virtual surgery, performed with a soft tissue prediction program, could correctly simulate the actual surgical outcome, focusing on soft tissue movement. Preoperative and postoperative computed tomography (CT) data for 29 patients, who had undergone orthognathic surgery, were obtained and analyzed using the Simplant Pro software. The program made a predicted soft tissue image (A) based on presurgical CT data. After the operation, we obtained actual postoperative CT data and an actual soft tissue image (B) was generated. Finally, the 2 images (A and B) were superimposed and analyzed differences between the A and B. Results were grouped in 2 classes: absolute values and vector values. In the absolute values, the left mouth corner was the most significant error point (2.36 mm). The right mouth corner (2.28 mm), labrale inferius (2.08 mm), and the pogonion (2.03 mm) also had significant errors. In vector values, prediction of the right-left side had a left-sided tendency, the superior-inferior had a superior tendency, and the anterior-posterior showed an anterior tendency. As a result, with this program, the position of points tended to be located more left, anterior, and superior than the "real" situation. There is a need to improve the prediction accuracy for soft tissue images. Such software is particularly valuable in predicting craniofacial soft tissues landmarks, such as the pronasale. With this software, landmark positions were most inaccurate in terms of anterior-posterior predictions.

  3. Accurate perception of negative emotions predicts functional capacity in schizophrenia.

    PubMed

    Abram, Samantha V; Karpouzian, Tatiana M; Reilly, James L; Derntl, Birgit; Habel, Ute; Smith, Matthew J

    2014-04-30

    Several studies suggest facial affect perception (FAP) deficits in schizophrenia are linked to poorer social functioning. However, whether reduced functioning is associated with inaccurate perception of specific emotional valence or a global FAP impairment remains unclear. The present study examined whether impairment in the perception of specific emotional valences (positive, negative) and neutrality were uniquely associated with social functioning, using a multimodal social functioning battery. A sample of 59 individuals with schizophrenia and 41 controls completed a computerized FAP task, and measures of functional capacity, social competence, and social attainment. Participants also underwent neuropsychological testing and symptom assessment. Regression analyses revealed that only accurately perceiving negative emotions explained significant variance (7.9%) in functional capacity after accounting for neurocognitive function and symptoms. Partial correlations indicated that accurately perceiving anger, in particular, was positively correlated with functional capacity. FAP for positive, negative, or neutral emotions were not related to social competence or social attainment. Our findings were consistent with prior literature suggesting negative emotions are related to functional capacity in schizophrenia. Furthermore, the observed relationship between perceiving anger and performance of everyday living skills is novel and warrants further exploration.

  4. Protein molecular function prediction by Bayesian phylogenomics.

    PubMed

    Engelhardt, Barbara E; Jordan, Michael I; Muratore, Kathryn E; Brenner, Steven E

    2005-10-01

    We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5'-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTER's prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.

  5. Towards Accurate Ab Initio Predictions of the Spectrum of Methane

    NASA Technical Reports Server (NTRS)

    Schwenke, David W.; Kwak, Dochan (Technical Monitor)

    2001-01-01

    We have carried out extensive ab initio calculations of the electronic structure of methane, and these results are used to compute vibrational energy levels. We include basis set extrapolations, core-valence correlation, relativistic effects, and Born- Oppenheimer breakdown terms in our calculations. Our ab initio predictions of the lowest lying levels are superb.

  6. Accurate Theoretical Prediction of the Properties of Energetic Materials

    DTIC Science & Technology

    2007-11-02

    calculations (e.g. Cheetah ). 8. Sensitivity. The structure prediction and lattice potential work will serve as a platform to examine impact/shock...nitromethane molecules. (In an extension of the present work, we will freeze the internal coordinates of the molecules and assess the extent to which the

  7. Learning regulatory programs that accurately predict differential expression with MEDUSA.

    PubMed

    Kundaje, Anshul; Lianoglou, Steve; Li, Xuejing; Quigley, David; Arias, Marta; Wiggins, Chris H; Zhang, Li; Leslie, Christina

    2007-12-01

    Inferring gene regulatory networks from high-throughput genomic data is one of the central problems in computational biology. In this paper, we describe a predictive modeling approach for studying regulatory networks, based on a machine learning algorithm called MEDUSA. MEDUSA integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, MEDUSA determines condition-specific regulators and discovers regulatory motifs that mediate the regulation of target genes. In this way, MEDUSA meaningfully models biological mechanisms of transcriptional regulation. MEDUSA solves the problem of predicting the differential (up/down) expression of target genes by using boosting, a technique from statistical learning, which helps to avoid overfitting as the algorithm searches through the high-dimensional space of potential regulators and sequence motifs. Experimental results demonstrate that MEDUSA achieves high prediction accuracy on held-out experiments (test data), that is, data not seen in training. We also present context-specific analysis of MEDUSA regulatory programs for DNA damage and hypoxia, demonstrating that MEDUSA identifies key regulators and motifs in these processes. A central challenge in the field is the difficulty of validating reverse-engineered networks in the absence of a gold standard. Our approach of learning regulatory programs provides at least a partial solution for the problem: MEDUSA's prediction accuracy on held-out data gives a concrete and statistically sound way to validate how well the algorithm performs. With MEDUSA, statistical validation becomes a prerequisite for hypothesis generation and network building rather than a secondary consideration.

  8. Standardized EEG interpretation accurately predicts prognosis after cardiac arrest

    PubMed Central

    Rossetti, Andrea O.; van Rootselaar, Anne-Fleur; Wesenberg Kjaer, Troels; Horn, Janneke; Ullén, Susann; Friberg, Hans; Nielsen, Niklas; Rosén, Ingmar; Åneman, Anders; Erlinge, David; Gasche, Yvan; Hassager, Christian; Hovdenes, Jan; Kjaergaard, Jesper; Kuiper, Michael; Pellis, Tommaso; Stammet, Pascal; Wanscher, Michael; Wetterslev, Jørn; Wise, Matt P.; Cronberg, Tobias

    2016-01-01

    Objective: To identify reliable predictors of outcome in comatose patients after cardiac arrest using a single routine EEG and standardized interpretation according to the terminology proposed by the American Clinical Neurophysiology Society. Methods: In this cohort study, 4 EEG specialists, blinded to outcome, evaluated prospectively recorded EEGs in the Target Temperature Management trial (TTM trial) that randomized patients to 33°C vs 36°C. Routine EEG was performed in patients still comatose after rewarming. EEGs were classified into highly malignant (suppression, suppression with periodic discharges, burst-suppression), malignant (periodic or rhythmic patterns, pathological or nonreactive background), and benign EEG (absence of malignant features). Poor outcome was defined as best Cerebral Performance Category score 3–5 until 180 days. Results: Eight TTM sites randomized 202 patients. EEGs were recorded in 103 patients at a median 77 hours after cardiac arrest; 37% had a highly malignant EEG and all had a poor outcome (specificity 100%, sensitivity 50%). Any malignant EEG feature had a low specificity to predict poor prognosis (48%) but if 2 malignant EEG features were present specificity increased to 96% (p < 0.001). Specificity and sensitivity were not significantly affected by targeted temperature or sedation. A benign EEG was found in 1% of the patients with a poor outcome. Conclusions: Highly malignant EEG after rewarming reliably predicted poor outcome in half of patients without false predictions. An isolated finding of a single malignant feature did not predict poor outcome whereas a benign EEG was highly predictive of a good outcome. PMID:26865516

  9. PREFACE: Protein protein interactions: principles and predictions

    NASA Astrophysics Data System (ADS)

    Nussinov, Ruth; Tsai, Chung-Jung

    2005-06-01

    Proteins are the `workhorses' of the cell. Their roles span functions as diverse as being molecular machines and signalling. They carry out catalytic reactions, transport, form viral capsids, traverse membranes and form regulated channels, transmit information from DNA to RNA, making possible the synthesis of new proteins, and they are responsible for the degradation of unnecessary proteins and nucleic acids. They are the vehicles of the immune response and are responsible for viral entry into the cell. Given their importance, considerable effort has been centered on the prediction of protein function. A prime way to do this is through identification of binding partners. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) in which it plays a role. This holds since the vast majority of their chores in the living cell involve protein-protein interactions. Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation. Their identification is at the heart of functional genomics; their prediction is crucial for drug discovery. Knowledge of the pathway, its topology, length, and dynamics may provide useful information for forecasting side effects. The goal of predicting protein-protein interactions is daunting. Some associations are obligatory, others are continuously forming and dissociating. In principle, from the physical standpoint, any two proteins can interact, but under what conditions and at which strength? The principles of protein-protein interactions are general: the non-covalent interactions of two proteins are largely the outcome of the hydrophobic effect, which drives the interactions. In addition, hydrogen bonds and electrostatic interactions play important roles. Thus, many of the interactions observed in vitro are the outcome of experimental overexpression. Protein disorder

  10. How Accurately Can We Predict Eclipses for Algol? (Poster abstract)

    NASA Astrophysics Data System (ADS)

    Turner, D.

    2016-06-01

    (Abstract only) beta Persei, or Algol, is a very well known eclipsing binary system consisting of a late B-type dwarf that is regularly eclipsed by a GK subgiant every 2.867 days. Eclipses, which last about 8 hours, are regular enough that predictions for times of minima are published in various places, Sky & Telescope magazine and The Observer's Handbook, for example. But eclipse minimum lasts for less than a half hour, whereas subtle mistakes in the current ephemeris for the star can result in predictions that are off by a few hours or more. The Algol system is fairly complex, with the Algol A and Algol B eclipsing system also orbited by Algol C with an orbital period of nearly 2 years. Added to that are complex long-term O-C variations with a periodicity of almost two centuries that, although suggested by Hoffmeister to be spurious, fit the type of light travel time variations expected for a fourth star also belonging to the system. The AB sub-system also undergoes mass transfer events that add complexities to its O-C behavior. Is it actually possible to predict precise times of eclipse minima for Algol months in advance given such complications, or is it better to encourage ongoing observations of the star so that O-C variations can be tracked in real time?

  11. Predictive rendering for accurate material perception: modeling and rendering fabrics

    NASA Astrophysics Data System (ADS)

    Bala, Kavita

    2012-03-01

    In computer graphics, rendering algorithms are used to simulate the appearance of objects and materials in a wide range of applications. Designers and manufacturers rely entirely on these rendered images to previsualize scenes and products before manufacturing them. They need to differentiate between different types of fabrics, paint finishes, plastics, and metals, often with subtle differences, for example, between silk and nylon, formaica and wood. Thus, these applications need predictive algorithms that can produce high-fidelity images that enable such subtle material discrimination.

  12. Can numerical simulations accurately predict hydrodynamic instabilities in liquid films?

    NASA Astrophysics Data System (ADS)

    Denner, Fabian; Charogiannis, Alexandros; Pradas, Marc; van Wachem, Berend G. M.; Markides, Christos N.; Kalliadasis, Serafim

    2014-11-01

    Understanding the dynamics of hydrodynamic instabilities in liquid film flows is an active field of research in fluid dynamics and non-linear science in general. Numerical simulations offer a powerful tool to study hydrodynamic instabilities in film flows and can provide deep insights into the underlying physical phenomena. However, the direct comparison of numerical results and experimental results is often hampered by several reasons. For instance, in numerical simulations the interface representation is problematic and the governing equations and boundary conditions may be oversimplified, whereas in experiments it is often difficult to extract accurate information on the fluid and its behavior, e.g. determine the fluid properties when the liquid contains particles for PIV measurements. In this contribution we present the latest results of our on-going, extensive study on hydrodynamic instabilities in liquid film flows, which includes direct numerical simulations, low-dimensional modelling as well as experiments. The major focus is on wave regimes, wave height and wave celerity as a function of Reynolds number and forcing frequency of a falling liquid film. Specific attention is paid to the differences in numerical and experimental results and the reasons for these differences. The authors are grateful to the EPSRC for their financial support (Grant EP/K008595/1).

  13. A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions.

    PubMed

    Birlutiu, Adriana; d'Alché-Buc, Florence; Heskes, Tom

    2015-01-01

    Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.

  14. Objective criteria accurately predict amputation following lower extremity trauma.

    PubMed

    Johansen, K; Daines, M; Howey, T; Helfet, D; Hansen, S T

    1990-05-01

    MESS (Mangled Extremity Severity Score) is a simple rating scale for lower extremity trauma, based on skeletal/soft-tissue damage, limb ischemia, shock, and age. Retrospective analysis of severe lower extremity injuries in 25 trauma victims demonstrated a significant difference between MESS values for 17 limbs ultimately salvaged (mean, 4.88 +/- 0.27) and nine requiring amputation (mean, 9.11 +/- 0.51) (p less than 0.01). A prospective trial of MESS in lower extremity injuries managed at two trauma centers again demonstrated a significant difference between MESS values of 14 salvaged (mean, 4.00 +/- 0.28) and 12 doomed (mean, 8.83 +/- 0.53) limbs (p less than 0.01). In both the retrospective survey and the prospective trial, a MESS value greater than or equal to 7 predicted amputation with 100% accuracy. MESS may be useful in selecting trauma victims whose irretrievably injured lower extremities warrant primary amputation.

  15. Improved Ecosystem Predictions of the California Current System via Accurate Light Calculations

    DTIC Science & Technology

    2011-09-30

    System via Accurate Light Calculations Curtis D. Mobley Sequoia Scientific, Inc. 2700 Richards Road, Suite 107 Bellevue, WA 98005 phone: 425...incorporate extremely fast but accurate light calculations into coupled physical-biological-optical ocean ecosystem models as used for operational three...dimensional ecosystem predictions. Improvements in light calculations lead to improvements in predictions of chlorophyll concentrations and other

  16. Accurate predictions for the production of vaporized water

    SciTech Connect

    Morin, E.; Montel, F.

    1995-12-31

    The production of water vaporized in the gas phase is controlled by the local conditions around the wellbore. The pressure gradient applied to the formation creates a sharp increase of the molar water content in the hydrocarbon phase approaching the well; this leads to a drop in the pore water saturation around the wellbore. The extent of the dehydrated zone which is formed is the key controlling the bottom-hole content of vaporized water. The maximum water content in the hydrocarbon phase at a given pressure, temperature and salinity is corrected by capillarity or adsorption phenomena depending on the actual water saturation. Describing the mass transfer of the water between the hydrocarbon phases and the aqueous phase into the tubing gives a clear idea of vaporization effects on the formation of scales. Field example are presented for gas fields with temperatures ranging between 140{degrees}C and 180{degrees}C, where water vaporization effects are significant. Conditions for salt plugging in the tubing are predicted.

  17. Change in BMI Accurately Predicted by Social Exposure to Acquaintances

    PubMed Central

    Oloritun, Rahman O.; Ouarda, Taha B. M. J.; Moturu, Sai; Madan, Anmol; Pentland, Alex (Sandy); Khayal, Inas

    2013-01-01

    Research has mostly focused on obesity and not on processes of BMI change more generally, although these may be key factors that lead to obesity. Studies have suggested that obesity is affected by social ties. However these studies used survey based data collection techniques that may be biased toward select only close friends and relatives. In this study, mobile phone sensing techniques were used to routinely capture social interaction data in an undergraduate dorm. By automating the capture of social interaction data, the limitations of self-reported social exposure data are avoided. This study attempts to understand and develop a model that best describes the change in BMI using social interaction data. We evaluated a cohort of 42 college students in a co-located university dorm, automatically captured via mobile phones and survey based health-related information. We determined the most predictive variables for change in BMI using the least absolute shrinkage and selection operator (LASSO) method. The selected variables, with gender, healthy diet category, and ability to manage stress, were used to build multiple linear regression models that estimate the effect of exposure and individual factors on change in BMI. We identified the best model using Akaike Information Criterion (AIC) and R2. This study found a model that explains 68% (p<0.0001) of the variation in change in BMI. The model combined social interaction data, especially from acquaintances, and personal health-related information to explain change in BMI. This is the first study taking into account both interactions with different levels of social interaction and personal health-related information. Social interactions with acquaintances accounted for more than half the variation in change in BMI. This suggests the importance of not only individual health information but also the significance of social interactions with people we are exposed to, even people we may not consider as close friends. PMID

  18. IDSite: An accurate approach to predict P450-mediated drug metabolism

    PubMed Central

    Li, Jianing; Schneebeli, Severin T.; Bylund, Joseph; Farid, Ramy; Friesner, Richard A.

    2011-01-01

    Accurate prediction of drug metabolism is crucial for drug design. Since a large majority of drugs metabolism involves P450 enzymes, we herein describe a computational approach, IDSite, to predict P450-mediated drug metabolism. To model induced-fit effects, IDSite samples the conformational space with flexible docking in Glide followed by two refinement stages using the Protein Local Optimization Program (PLOP). Sites of metabolism (SOMs) are predicted according to a physical-based score that evaluates the potential of atoms to react with the catalytic iron center. As a preliminary test, we present in this paper the prediction of hydroxylation and O-dealkylation sites mediated by CYP2D6 using two different models: a physical-based simulation model, and a modification of this model in which a small number of parameters are fit to a training set. Without fitting any parameters to experimental data, the Physical IDSite scoring recovers 83% of the experimental observations for 56 compounds with a very low false positive rate. With only 4 fitted parameters, the Fitted IDSite was trained with the subset of 36 compounds and successfully applied to the other 20 compounds, recovering 94% of the experimental observations with high sensitivity and specificity for both sets. PMID:22247702

  19. Accurate refinement of docked protein complexes using evolutionary information and deep learning.

    PubMed

    Akbal-Delibas, Bahar; Farhoodi, Roshanak; Pomplun, Marc; Haspel, Nurit

    2016-06-01

    One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures.

  20. Toxicology of protein allergenicity: prediction and characterization.

    PubMed

    Kimber, I; Kerkvliet, N I; Taylor, S L; Astwood, J D; Sarlo, K; Dearman, R J

    1999-04-01

    The ability of exogenous proteins to cause respiratory and gastrointestinal allergy, and sometimes systemic anaphylactic reactions, is well known. What is not clear however, are the properties that confer on proteins the ability to induce allergic sensitization. With an expansion in the use of enzymes for industrial applications and consumer products, and a substantial and growing investment in the development of transgenic crop plants that express novel proteins introduced from other sources, the issue of protein allergenicity has assumed considerable toxicological significance. There is a need now for methods that will allow the accurate identification and characterization of potential protein allergens and for estimation of relative potency as a first step towards risk assessment. To address some of these issues, and to review progress that has been made in the toxicological investigation of respiratory and gastrointestinal allergy induced by proteins, a workshop, entitled the Toxicology of Protein Allergenicity: Prediction and Characterization, was convened at the 37th Annual Conference of the Society of Toxicology in Seattle, Washington (1998). The subject of protein allergenicity is considered here in the context of presentations made at that workshop.

  1. Multitask learning for protein subcellular location prediction.

    PubMed

    Xu, Qian; Pan, Sinno Jialin; Xue, Hannah Hong; Yang, Qiang

    2011-01-01

    Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational methods. The location information can indicate key functionalities of proteins. Thus, accurate prediction of subcellular localizations of proteins can help the prediction of protein functions and genome annotations, as well as the identification of drug targets. Machine learning methods such as Support Vector Machines (SVMs) have been used in the past for the problem of protein subcellular localization, but have been shown to suffer from a lack of annotated training data in each species under study. To overcome this data sparsity problem, we observe that because some of the organisms may be related to each other, there may be some commonalities across different organisms that can be discovered and used to help boost the data in each localization task. In this paper, we formulate protein subcellular localization problem as one of multitask learning across different organisms. We adapt and compare two specializations of the multitask learning algorithms on 20 different organisms. Our experimental results show that multitask learning performs much better than the traditional single-task methods. Among the different multitask learning methods, we found that the multitask kernels and supertype kernels under multitask learning that share parameters perform slightly better than multitask learning by sharing latent features. The most significant improvement in terms of localization accuracy is about 25 percent. We find that if the organisms are very different or are remotely related from a biological point of view, then jointly training the multiple models cannot lead to significant improvement. However, if they are closely related biologically, the multitask learning can do much better than individual learning.

  2. ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

    SciTech Connect

    Visel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.

    2009-02-01

    A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.

  3. PI2PE: protein interface/interior prediction engine.

    PubMed

    Tjong, Harianto; Qin, Sanbo; Zhou, Huan-Xiang

    2007-07-01

    The side chains of the 20 types of amino acids, owing to a large extent to their different physical properties, have characteristic distributions in interior/surface regions of individual proteins and in interface/non-interface portions of protein surfaces that bind proteins or nucleic acids. These distributions have important structural and functional implications. We have developed accurate methods for predicting the solvent accessibility of amino acids from a protein sequence and for predicting interface residues from the structure of a protein-binding or DNA-binding protein. The methods are called WESA, cons-PPISP and DISPLAR, respectively. The web servers of these methods are now available at http://pipe.scs.fsu.edu. To illustrate the utility of these web servers, cons-PPISP and DISPLAR predictions are used to construct a structural model for a multicomponent protein-DNA complex.

  4. An iterative approach of protein function prediction

    PubMed Central

    2011-01-01

    Background Current approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms. Results In this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. Conclusions The iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting functions iteratively. The

  5. Predictions of Protein-Protein Interfaces within Membrane Protein Complexes

    PubMed Central

    Asadabadi, Ebrahim Barzegari; Abdolmaleki, Parviz

    2013-01-01

    Background Prediction of interaction sites within the membrane protein complexes using the sequence data is of a great importance, because it would find applications in modification of molecules transport through membrane, signaling pathways and drug targets of many diseases. Nevertheless, it has gained little attention from the protein structural bioinformatics community. Methods In this study, a wide variety of prediction and classification tools were applied to distinguish the residues at the interfaces of membrane proteins from those not in the interfaces. Results The tuned SVM model achieved the high accuracy of 86.95% and the AUC of 0.812 which outperforms the results of the only previous similar study. Nevertheless, prediction performances obtained using most employed models cannot be used in applied fields and needs more effort to improve. Conclusion Considering the variety of the applied tools in this study, the present investigation could be a good starting point to develop more efficient tools to predict the membrane protein interaction site residues. PMID:23919118

  6. An efficient sliding window strategy for accurate location of eukaryotic protein coding regions.

    PubMed

    Rao, Nini; Lei, Xu; Guo, Jianxiu; Huang, Hao; Ren, Zhenglong

    2009-04-01

    The sliding window is one of important factors that seriously affect the accuracy of coding region prediction and location for the methods based on power spectrum technique. It is very difficult to select the appropriate sliding step and the window length for different organisms. In this study, a novel sliding window strategy is proposed on the basis of power spectrum analysis for the accurate location of eukaryotic protein coding regions. The proposed sliding window strategy is very simple and the sliding step of window is changeable. Our tests show that the average location error for the novel method is 12 bases. Compared with the previous location error of 54 bases using the fixed sliding step, the novel sliding window strategy increased the location accuracy greatly. Further, the consumed CPU time to run the novel strategy is much shorter than the strategy of the fixed length sliding step. So, the computational complexity for the novel method is decreased greatly.

  7. An unexpected way forward: towards a more accurate and rigorous protein-protein binding affinity scoring function by eliminating terms from an already simple scoring function.

    PubMed

    Swanson, Jon; Audie, Joseph

    2017-01-16

    A fundamental and unsolved problem in biophysical chemistry is the development of a computationally simple, physically intuitive, and generally applicable method for accurately predicting and physically explaining protein-protein binding affinities from protein-protein interaction (PPI) complex coordinates. Here, we propose that the simplification of a previously described six-term PPI scoring function to a four term function results in a simple expression of all physically and statistically meaningful terms that can be used to accurately predict and explain binding affinities for a well-defined subset of PPIs that are characterized by (1) crystallographic coordinates, (2) rigid-body association, (3) normal interface size, and hydrophobicity and hydrophilicity, and (4) high quality experimental binding affinity measurements. We further propose that the four-term scoring function could be regarded as a core expression for future development into a more general PPI scoring function. Our work has clear implications for PPI modeling and structure-based drug design.

  8. Serum Protein Profile at Remission Can Accurately Assess Therapeutic Outcomes and Survival for Serous Ovarian Cancer

    PubMed Central

    Ghamande, Sharad A.; Bush, Stephen; Ferris, Daron; Zhi, Wenbo; He, Mingfang; Wang, Meiyao; Wang, Xiaoxiao; Miller, Eric; Hopkins, Diane; Macfee, Michael; Guan, Ruili; Tang, Jinhai; She, Jin-Xiong

    2013-01-01

    Background Biomarkers play critical roles in early detection, diagnosis and monitoring of therapeutic outcome and recurrence of cancer. Previous biomarker research on ovarian cancer (OC) has mostly focused on the discovery and validation of diagnostic biomarkers. The primary purpose of this study is to identify serum biomarkers for prognosis and therapeutic outcomes of ovarian cancer. Experimental Design Forty serum proteins were analyzed in 70 serum samples from healthy controls (HC) and 101 serum samples from serous OC patients at three different disease phases: post diagnosis (PD), remission (RM) and recurrence (RC). The utility of serum proteins as OC biomarkers was evaluated using a variety of statistical methods including survival analysis. Results Ten serum proteins (PDGF-AB/BB, PDGF-AA, CRP, sFas, CA125, SAA, sTNFRII, sIL-6R, IGFBP6 and MDC) have individually good area-under-the-curve (AUC) values (AUC = 0.69–0.86) and more than 10 three-marker combinations have excellent AUC values (0.91–0.93) in distinguishing active cancer samples (PD & RC) from HC. The mean serum protein levels for RM samples are usually intermediate between HC and OC patients with active cancer (PD & RC). Most importantly, five proteins (sICAM1, RANTES, sgp130, sTNFR-II and sVCAM1) measured at remission can classify, individually and in combination, serous OC patients into two subsets with significantly different overall survival (best HR = 17, p<10−3). Conclusion We identified five serum proteins which, when measured at remission, can accurately predict the overall survival of serous OC patients, suggesting that they may be useful for monitoring the therapeutic outcomes for ovarian cancer. PMID:24244307

  9. Accurate prediction of adsorption energies on graphene, using a dispersion-corrected semiempirical method including solvation.

    PubMed

    Vincent, Mark A; Hillier, Ian H

    2014-08-25

    The accurate prediction of the adsorption energies of unsaturated molecules on graphene in the presence of water is essential for the design of molecules that can modify its properties and that can aid its processability. We here show that a semiempirical MO method corrected for dispersive interactions (PM6-DH2) can predict the adsorption energies of unsaturated hydrocarbons and the effect of substitution on these values to an accuracy comparable to DFT values and in good agreement with the experiment. The adsorption energies of TCNE, TCNQ, and a number of sulfonated pyrenes are also predicted, along with the effect of hydration using the COSMO model.

  10. Accurately predicting copper interconnect topographies in foundry design for manufacturability flows

    NASA Astrophysics Data System (ADS)

    Lu, Daniel; Fan, Zhong; Tak, Ki Duk; Chang, Li-Fu; Zou, Elain; Jiang, Jenny; Yang, Josh; Zhuang, Linda; Chen, Kuang Han; Hurat, Philippe; Ding, Hua

    2011-04-01

    This paper presents a model-based Chemical Mechanical Polishing (CMP) Design for Manufacturability (DFM) () methodology that includes an accurate prediction of post-CMP copper interconnect topographies at the advanced process technology nodes. Using procedures of extensive model calibration and validation, the CMP process model accurately predicts post-CMP dimensions, such as erosion, dishing, and copper thickness with excellent correlation to silicon measurements. This methodology provides an efficient DFM flow to detect and fix physical manufacturing hotspots related to copper pooling and Depth of Focus (DOF) failures at both block- and full chip level designs. Moreover, the predicted thickness output is used in the CMP-aware RC extraction and Timing analysis flows for better understanding of performance yield and timing impact. In addition, the CMP model can be applied to the verification of model-based dummy fill flows.

  11. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction

    PubMed Central

    Singh, Ritambhara; Kuscu, Cem; Quinlan, Aaron; Qi, Yanjun; Adli, Mazhar

    2015-01-01

    The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CRISPR/Cas9 Off-target Prediction and Identification Tool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites. PMID:26032770

  12. Modeling methodology for the accurate and prompt prediction of symptomatic events in chronic diseases.

    PubMed

    Pagán, Josué; Risco-Martín, José L; Moya, José M; Ayala, José L

    2016-08-01

    Prediction of symptomatic crises in chronic diseases allows to take decisions before the symptoms occur, such as the intake of drugs to avoid the symptoms or the activation of medical alarms. The prediction horizon is in this case an important parameter in order to fulfill the pharmacokinetics of medications, or the time response of medical services. This paper presents a study about the prediction limits of a chronic disease with symptomatic crises: the migraine. For that purpose, this work develops a methodology to build predictive migraine models and to improve these predictions beyond the limits of the initial models. The maximum prediction horizon is analyzed, and its dependency on the selected features is studied. A strategy for model selection is proposed to tackle the trade off between conservative but robust predictive models, with respect to less accurate predictions with higher horizons. The obtained results show a prediction horizon close to 40min, which is in the time range of the drug pharmacokinetics. Experiments have been performed in a realistic scenario where input data have been acquired in an ambulatory clinical study by the deployment of a non-intrusive Wireless Body Sensor Network. Our results provide an effective methodology for the selection of the future horizon in the development of prediction algorithms for diseases experiencing symptomatic crises.

  13. DSP: a protein shape string and its profile prediction server

    PubMed Central

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-01-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/. PMID:22553364

  14. DSP: a protein shape string and its profile prediction server.

    PubMed

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-07-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.

  15. An effective method for accurate prediction of the first hyperpolarizability of alkalides.

    PubMed

    Wang, Jia-Nan; Xu, Hong-Liang; Sun, Shi-Ling; Gao, Ting; Li, Hong-Zhi; Li, Hui; Su, Zhong-Min

    2012-01-15

    The proper theoretical calculation method for nonlinear optical (NLO) properties is a key factor to design the excellent NLO materials. Yet it is a difficult task to obatin the accurate NLO property of large scale molecule. In present work, an effective intelligent computing method, as called extreme learning machine-neural network (ELM-NN), is proposed to predict accurately the first hyperpolarizability (β(0)) of alkalides from low-accuracy first hyperpolarizability. Compared with neural network (NN) and genetic algorithm neural network (GANN), the root-mean-square deviations of the predicted values obtained by ELM-NN, GANN, and NN with their MP2 counterpart are 0.02, 0.08, and 0.17 a.u., respectively. It suggests that the predicted values obtained by ELM-NN are more accurate than those calculated by NN and GANN methods. Another excellent point of ELM-NN is the ability to obtain the high accuracy level calculated values with less computing cost. Experimental results show that the computing time of MP2 is 2.4-4 times of the computing time of ELM-NN. Thus, the proposed method is a potentially powerful tool in computational chemistry, and it may predict β(0) of the large scale molecules, which is difficult to obtain by high-accuracy theoretical method due to dramatic increasing computational cost.

  16. NESmapper: accurate prediction of leucine-rich nuclear export signals using activity-based profiles.

    PubMed

    Kosugi, Shunichi; Yanagawa, Hiroshi; Terauchi, Ryohei; Tabata, Satoshi

    2014-09-01

    The nuclear export of proteins is regulated largely through the exportin/CRM1 pathway, which involves the specific recognition of leucine-rich nuclear export signals (NESs) in the cargo proteins, and modulates nuclear-cytoplasmic protein shuttling by antagonizing the nuclear import activity mediated by importins and the nuclear import signal (NLS). Although the prediction of NESs can help to define proteins that undergo regulated nuclear export, current methods of predicting NESs, including computational tools and consensus-sequence-based searches, have limited accuracy, especially in terms of their specificity. We found that each residue within an NES largely contributes independently and additively to the entire nuclear export activity. We created activity-based profiles of all classes of NESs with a comprehensive mutational analysis in mammalian cells. The profiles highlight a number of specific activity-affecting residues not only at the conserved hydrophobic positions but also in the linker and flanking regions. We then developed a computational tool, NESmapper, to predict NESs by using profiles that had been further optimized by training and combining the amino acid properties of the NES-flanking regions. This tool successfully reduced the considerable number of false positives, and the overall prediction accuracy was higher than that of other methods, including NESsential and Wregex. This profile-based prediction strategy is a reliable way to identify functional protein motifs. NESmapper is available at http://sourceforge.net/projects/nesmapper.

  17. Statistical analysis and prediction of protein-protein interfaces.

    PubMed

    Bordner, Andrew J; Abagyan, Ruben

    2005-08-15

    Predicting protein-protein interfaces from a three-dimensional structure is a key task of computational structural proteomics. In contrast to geometrically distinct small molecule binding sites, protein-protein interface are notoriously difficult to predict. We generated a large nonredundant data set of 1494 true protein-protein interfaces using biological symmetry annotation where necessary. The data set was carefully analyzed and a Support Vector Machine was trained on a combination of a new robust evolutionary conservation signal with the local surface properties to predict protein-protein interfaces. Fivefold cross validation verifies the high sensitivity and selectivity of the model. As much as 97% of the predicted patches had an overlap with the true interface patch while only 22% of the surface residues were included in an average predicted patch. The model allowed the identification of potential new interfaces and the correction of mislabeled oligomeric states.

  18. Accurate Prediction of Ligand Affinities for a Proton-Dependent Oligopeptide Transporter

    PubMed Central

    Samsudin, Firdaus; Parker, Joanne L.; Sansom, Mark S.P.; Newstead, Simon; Fowler, Philip W.

    2016-01-01

    Summary Membrane transporters are critical modulators of drug pharmacokinetics, efficacy, and safety. One example is the proton-dependent oligopeptide transporter PepT1, also known as SLC15A1, which is responsible for the uptake of the β-lactam antibiotics and various peptide-based prodrugs. In this study, we modeled the binding of various peptides to a bacterial homolog, PepTSt, and evaluated a range of computational methods for predicting the free energy of binding. Our results show that a hybrid approach (endpoint methods to classify peptides into good and poor binders and a theoretically exact method for refinement) is able to accurately predict affinities, which we validated using proteoliposome transport assays. Applying the method to a homology model of PepT1 suggests that the approach requires a high-quality structure to be accurate. Our study provides a blueprint for extending these computational methodologies to other pharmaceutically important transporter families. PMID:27028887

  19. Fast and Accurate Prediction of Stratified Steel Temperature During Holding Period of Ladle

    NASA Astrophysics Data System (ADS)

    Deodhar, Anirudh; Singh, Umesh; Shukla, Rishabh; Gautham, B. P.; Singh, Amarendra K.

    2017-04-01

    Thermal stratification of liquid steel in a ladle during the holding period and the teeming operation has a direct bearing on the superheat available at the caster and hence on the caster set points such as casting speed and cooling rates. The changes in the caster set points are typically carried out based on temperature measurements at the end of tundish outlet. Thermal prediction models provide advance knowledge of the influence of process and design parameters on the steel temperature at various stages. Therefore, they can be used in making accurate decisions about the caster set points in real time. However, this requires both fast and accurate thermal prediction models. In this work, we develop a surrogate model for the prediction of thermal stratification using data extracted from a set of computational fluid dynamics (CFD) simulations, pre-determined using design of experiments technique. Regression method is used for training the predictor. The model predicts the stratified temperature profile instantaneously, for a given set of process parameters such as initial steel temperature, refractory heat content, slag thickness, and holding time. More than 96 pct of the predicted values are within an error range of ±5 K (±5 °C), when compared against corresponding CFD results. Considering its accuracy and computational efficiency, the model can be extended for thermal control of casting operations. This work also sets a benchmark for developing similar thermal models for downstream processes such as tundish and caster.

  20. Fast and Accurate Prediction of Stratified Steel Temperature During Holding Period of Ladle

    NASA Astrophysics Data System (ADS)

    Deodhar, Anirudh; Singh, Umesh; Shukla, Rishabh; Gautham, B. P.; Singh, Amarendra K.

    2016-12-01

    Thermal stratification of liquid steel in a ladle during the holding period and the teeming operation has a direct bearing on the superheat available at the caster and hence on the caster set points such as casting speed and cooling rates. The changes in the caster set points are typically carried out based on temperature measurements at the end of tundish outlet. Thermal prediction models provide advance knowledge of the influence of process and design parameters on the steel temperature at various stages. Therefore, they can be used in making accurate decisions about the caster set points in real time. However, this requires both fast and accurate thermal prediction models. In this work, we develop a surrogate model for the prediction of thermal stratification using data extracted from a set of computational fluid dynamics (CFD) simulations, pre-determined using design of experiments technique. Regression method is used for training the predictor. The model predicts the stratified temperature profile instantaneously, for a given set of process parameters such as initial steel temperature, refractory heat content, slag thickness, and holding time. More than 96 pct of the predicted values are within an error range of ±5 K (±5 °C), when compared against corresponding CFD results. Considering its accuracy and computational efficiency, the model can be extended for thermal control of casting operations. This work also sets a benchmark for developing similar thermal models for downstream processes such as tundish and caster.

  1. Can phenological models predict tree phenology accurately under climate change conditions?

    NASA Astrophysics Data System (ADS)

    Chuine, Isabelle; Bonhomme, Marc; Legave, Jean Michel; García de Cortázar-Atauri, Inaki; Charrier, Guillaume; Lacointe, André; Améglio, Thierry

    2014-05-01

    The onset of the growing season of trees has been globally earlier by 2.3 days/decade during the last 50 years because of global warming and this trend is predicted to continue according to climate forecast. The effect of temperature on plant phenology is however not linear because temperature has a dual effect on bud development. On one hand, low temperatures are necessary to break bud dormancy, and on the other hand higher temperatures are necessary to promote bud cells growth afterwards. Increasing phenological changes in temperate woody species have strong impacts on forest trees distribution and productivity, as well as crops cultivation areas. Accurate predictions of trees phenology are therefore a prerequisite to understand and foresee the impacts of climate change on forests and agrosystems. Different process-based models have been developed in the last two decades to predict the date of budburst or flowering of woody species. They are two main families: (1) one-phase models which consider only the ecodormancy phase and make the assumption that endodormancy is always broken before adequate climatic conditions for cell growth occur; and (2) two-phase models which consider both the endodormancy and ecodormancy phases and predict a date of dormancy break which varies from year to year. So far, one-phase models have been able to predict accurately tree bud break and flowering under historical climate. However, because they do not consider what happens prior to ecodormancy, and especially the possible negative effect of winter temperature warming on dormancy break, it seems unlikely that they can provide accurate predictions in future climate conditions. It is indeed well known that a lack of low temperature results in abnormal pattern of bud break and development in temperate fruit trees. An accurate modelling of the dormancy break date has thus become a major issue in phenology modelling. Two-phases phenological models predict that global warming should delay

  2. Rapid and Highly Accurate Prediction of Poor Loop Diuretic Natriuretic Response in Patients With Heart Failure

    PubMed Central

    Testani, Jeffrey M.; Hanberg, Jennifer S.; Cheng, Susan; Rao, Veena; Onyebeke, Chukwuma; Laur, Olga; Kula, Alexander; Chen, Michael; Wilson, F. Perry; Darlington, Andrew; Bellumkonda, Lavanya; Jacoby, Daniel; Tang, W. H. Wilson; Parikh, Chirag R.

    2015-01-01

    Background Removal of excess sodium and fluid is a primary therapeutic objective in acute decompensated heart failure (ADHF) and commonly monitored with fluid balance and weight loss. However, these parameters are frequently inaccurate or not collected and require a delay of several hours after diuretic administration before they are available. Accessible tools for rapid and accurate prediction of diuretic response are needed. Methods and Results Based on well-established renal physiologic principles an equation was derived to predict net sodium output using a spot urine sample obtained one or two hours following loop diuretic administration. This equation was then prospectively validated in 50 ADHF patients using meticulously obtained timed 6-hour urine collections to quantitate loop diuretic induced cumulative sodium output. Poor natriuretic response was defined as a cumulative sodium output of <50 mmol, a threshold that would result in a positive sodium balance with twice-daily diuretic dosing. Following a median dose of 3 mg (2–4 mg) of intravenous bumetanide, 40% of the population had a poor natriuretic response. The correlation between measured and predicted sodium output was excellent (r=0.91, p<0.0001). Poor natriuretic response could be accurately predicted with the sodium prediction equation (AUC=0.95, 95% CI 0.89–1.0, p<0.0001). Clinically recorded net fluid output had a weaker correlation (r=0.66, p<0.001) and lesser ability to predict poor natriuretic response (AUC=0.76, 95% CI 0.63–0.89, p=0.002). Conclusions In patients being treated for ADHF, poor natriuretic response can be predicted soon after diuretic administration with excellent accuracy using a spot urine sample. PMID:26721915

  3. Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions

    PubMed Central

    2015-01-01

    Background Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9% to 76.1% correct. Conclusions We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release. PMID:25881257

  4. Highly Accurate Structure-Based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism.

    PubMed

    Kieslich, Chris A; Tamamis, Phanourios; Guzman, Yannis A; Onel, Melis; Floudas, Christodoulos A

    2016-01-01

    HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/.

  5. Accurate similarity index based on activity and connectivity of node for link prediction

    NASA Astrophysics Data System (ADS)

    Li, Longjie; Qian, Lvjian; Wang, Xiaoping; Luo, Shishun; Chen, Xiaoyun

    2015-05-01

    Recent years have witnessed the increasing of available network data; however, much of those data is incomplete. Link prediction, which can find the missing links of a network, plays an important role in the research and analysis of complex networks. Based on the assumption that two unconnected nodes which are highly similar are very likely to have an interaction, most of the existing algorithms solve the link prediction problem by computing nodes' similarities. The fundamental requirement of those algorithms is accurate and effective similarity indices. In this paper, we propose a new similarity index, namely similarity based on activity and connectivity (SAC), which performs link prediction more accurately. To compute the similarity between two nodes, this index employs the average activity of these two nodes in their common neighborhood and the connectivities between them and their common neighbors. The higher the average activity is and the stronger the connectivities are, the more similar the two nodes are. The proposed index not only commendably distinguishes the contributions of paths but also incorporates the influence of endpoints. Therefore, it can achieve a better predicting result. To verify the performance of SAC, we conduct experiments on 10 real-world networks. Experimental results demonstrate that SAC outperforms the compared baselines.

  6. MUFOLD: A new solution for protein 3D structure prediction.

    PubMed

    Zhang, Jingfen; Wang, Qingguo; Barz, Bogdan; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2010-04-01

    There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Calpha or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 A, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8.

  7. Accurate prediction of the linear viscoelastic properties of highly entangled mono and bidisperse polymer melts.

    PubMed

    Stephanou, Pavlos S; Mavrantzas, Vlasis G

    2014-06-07

    We present a hierarchical computational methodology which permits the accurate prediction of the linear viscoelastic properties of entangled polymer melts directly from the chemical structure, chemical composition, and molecular architecture of the constituent chains. The method entails three steps: execution of long molecular dynamics simulations with moderately entangled polymer melts, self-consistent mapping of the accumulated trajectories onto a tube model and parameterization or fine-tuning of the model on the basis of detailed simulation data, and use of the modified tube model to predict the linear viscoelastic properties of significantly higher molecular weight (MW) melts of the same polymer. Predictions are reported for the zero-shear-rate viscosity η0 and the spectra of storage G'(ω) and loss G″(ω) moduli for several mono and bidisperse cis- and trans-1,4 polybutadiene melts as well as for their MW dependence, and are found to be in remarkable agreement with experimentally measured rheological data.

  8. Predicting Protein Structure Using Parallel Genetic Algorithms.

    DTIC Science & Technology

    1994-12-01

    By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H...iiLite-d Approved for public release; distribution unlimited AFIT/ GCS /ENG/94D-03 Predicting Protein Structure Using Parallel Genetic Algorithms ...1-1 1.2 Genetic Algorithms ......... ............................ 1-3 1.3 The Protein Folding Problem

  9. Protein function prediction based on data fusion and functional interrelationship.

    PubMed

    Meng, Jun; Wekesa, Jael-Sanyanda; Shi, Guan-Li; Luan, Yu-Shi

    2016-04-01

    One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods.

  10. Prediction of Accurate Thermochemistry of Medium and Large Sized Radicals Using Connectivity-Based Hierarchy (CBH).

    PubMed

    Sengupta, Arkajyoti; Raghavachari, Krishnan

    2014-10-14

    Accurate modeling of the chemical reactions in many diverse areas such as combustion, photochemistry, or atmospheric chemistry strongly depends on the availability of thermochemical information of the radicals involved. However, accurate thermochemical investigations of radical systems using state of the art composite methods have mostly been restricted to the study of hydrocarbon radicals of modest size. In an alternative approach, systematic error-canceling thermochemical hierarchy of reaction schemes can be applied to yield accurate results for such systems. In this work, we have extended our connectivity-based hierarchy (CBH) method to the investigation of radical systems. We have calibrated our method using a test set of 30 medium sized radicals to evaluate their heats of formation. The CBH-rad30 test set contains radicals containing diverse functional groups as well as cyclic systems. We demonstrate that the sophisticated error-canceling isoatomic scheme (CBH-2) with modest levels of theory is adequate to provide heats of formation accurate to ∼1.5 kcal/mol. Finally, we predict heats of formation of 19 other large and medium sized radicals for which the accuracy of available heats of formation are less well-known.

  11. ESG: extended similarity group method for automated protein function prediction

    PubMed Central

    Chitale, Meghana; Hawkins, Troy; Park, Changsoon; Kihara, Daisuke

    2009-01-01

    Motivation: Importance of accurate automatic protein function prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation. Conventional methods have focused on high sequence similarity-based annotation transfer which relies on the concept of homology. However, many cases have been reported that simple transfer of function from top hits of a homology search causes erroneous annotation. New methods are required to handle the sequence similarity in a more robust way to combine together signals from strongly and weakly similar proteins for effectively predicting function for unknown proteins with high reliability. Results: We present the extended similarity group (ESG) method, which performs iterative sequence database searches and annotates a query sequence with Gene Ontology terms. Each annotation is assigned with probability based on its relative similarity score with the multiple-level neighbors in the protein similarity graph. We will depict how the statistical framework of ESG improves the prediction accuracy by iteratively taking into account the neighborhood of query protein in the sequence similarity space. ESG outperforms conventional PSI-BLAST and the protein function prediction (PFP) algorithm. It is found that the iterative search is effective in capturing multiple-domains in a query protein, enabling accurately predicting several functions which originate from different domains. Availability: ESG web server is available for automated protein function prediction at http://dragon.bio.purdue.edu/ESG/ Contact: cspark@cau.ac.kr; dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19435743

  12. Planar Near-Field Phase Retrieval Using GPUs for Accurate THz Far-Field Prediction

    NASA Astrophysics Data System (ADS)

    Junkin, Gary

    2013-04-01

    With a view to using Phase Retrieval to accurately predict Terahertz antenna far-field from near-field intensity measurements, this paper reports on three fundamental advances that achieve very low algorithmic error penalties. The first is a new Gaussian beam analysis that provides accurate initial complex aperture estimates including defocus and astigmatic phase errors, based only on first and second moment calculations. The second is a powerful noise tolerant near-field Phase Retrieval algorithm that combines Anderson's Plane-to-Plane (PTP) with Fienup's Hybrid-Input-Output (HIO) and Successive Over-Relaxation (SOR) to achieve increased accuracy at reduced scan separations. The third advance employs teraflop Graphical Processing Units (GPUs) to achieve practically real time near-field phase retrieval and to obtain the optimum aperture constraint without any a priori information.

  13. Protein Structure Prediction with Visuospatial Analogy

    NASA Astrophysics Data System (ADS)

    Davies, Jim; Glasgow, Janice; Kuo, Tony

    We show that visuospatial representations and reasoning techniques can be used as a similarity metric for analogical protein structure prediction. Our system retrieves pairs of α-helices based on contact map similarity, then transfers and adapts the structure information to an unknown helix pair, showing that similar protein contact maps predict similar 3D protein structure. The success of this method provides support for the notion that changing representations can enable similarity metrics in analogy.

  14. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space.

    PubMed

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O Anatole; Müller, Klaus-Robert; Tkatchenko, Alexandre

    2015-06-18

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

  15. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    DOE PAGES

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; ...

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstratemore » prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.« less

  16. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    SciTech Connect

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; Pronobis, Wiktor; von Lilienfeld, O. Anatole; Müller, Klaus -Robert; Tkatchenko, Alexandre

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

  17. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    SciTech Connect

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.; Arkin, Adam P.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

  18. Toward More Accurate Ancestral Protein Genotype–Phenotype Reconstructions with the Use of Species Tree-Aware Gene Trees

    PubMed Central

    Groussin, Mathieu; Hobbs, Joanne K.; Szöllősi, Gergely J.; Gribaldo, Simonetta; Arcus, Vickery L.; Gouy, Manolo

    2015-01-01

    The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype–phenotype space in which proteins diversify. PMID:25371435

  19. Development and Validation of a Multidisciplinary Tool for Accurate and Efficient Rotorcraft Noise Prediction (MUTE)

    NASA Technical Reports Server (NTRS)

    Liu, Yi; Anusonti-Inthra, Phuriwat; Diskin, Boris

    2011-01-01

    A physics-based, systematically coupled, multidisciplinary prediction tool (MUTE) for rotorcraft noise was developed and validated with a wide range of flight configurations and conditions. MUTE is an aggregation of multidisciplinary computational tools that accurately and efficiently model the physics of the source of rotorcraft noise, and predict the noise at far-field observer locations. It uses systematic coupling approaches among multiple disciplines including Computational Fluid Dynamics (CFD), Computational Structural Dynamics (CSD), and high fidelity acoustics. Within MUTE, advanced high-order CFD tools are used around the rotor blade to predict the transonic flow (shock wave) effects, which generate the high-speed impulsive noise. Predictions of the blade-vortex interaction noise in low speed flight are also improved by using the Particle Vortex Transport Method (PVTM), which preserves the wake flow details required for blade/wake and fuselage/wake interactions. The accuracy of the source noise prediction is further improved by utilizing a coupling approach between CFD and CSD, so that the effects of key structural dynamics, elastic blade deformations, and trim solutions are correctly represented in the analysis. The blade loading information and/or the flow field parameters around the rotor blade predicted by the CFD/CSD coupling approach are used to predict the acoustic signatures at far-field observer locations with a high-fidelity noise propagation code (WOPWOP3). The predicted results from the MUTE tool for rotor blade aerodynamic loading and far-field acoustic signatures are compared and validated with a variation of experimental data sets, such as UH60-A data, DNW test data and HART II test data.

  20. BLANNOTATOR: enhanced homology-based function prediction of bacterial proteins

    PubMed Central

    2012-01-01

    Background Automated function prediction has played a central role in determining the biological functions of bacterial proteins. Typically, protein function annotation relies on homology, and function is inferred from other proteins with similar sequences. This approach has become popular in bacterial genomics because it is one of the few methods that is practical for large datasets and because it does not require additional functional genomics experiments. However, the existing solutions produce erroneous predictions in many cases, especially when query sequences have low levels of identity with the annotated source protein. This problem has created a pressing need for improvements in homology-based annotation. Results We present an automated method for the functional annotation of bacterial protein sequences. Based on sequence similarity searches, BLANNOTATOR accurately annotates query sequences with one-line summary descriptions of protein function. It groups sequences identified by BLAST into subsets according to their annotation and bases its prediction on a set of sequences with consistent functional information. We show the results of BLANNOTATOR's performance in sets of bacterial proteins with known functions. We simulated the annotation process for 3090 SWISS-PROT proteins using a database in its state preceding the functional characterisation of the query protein. For this dataset, our method outperformed the five others that we tested, and the improved performance was maintained even in the absence of highly related sequence hits. We further demonstrate the value of our tool by analysing the putative proteome of Lactobacillus crispatus strain ST1. Conclusions BLANNOTATOR is an accurate method for bacterial protein function prediction. It is practical for genome-scale data and does not require pre-existing sequence clustering; thus, this method suits the needs of bacterial genome and metagenome researchers. The method and a web-server are available at

  1. Accurate prediction of severe allergic reactions by a small set of environmental parameters (NDVI, temperature).

    PubMed

    Notas, George; Bariotakis, Michail; Kalogrias, Vaios; Andrianaki, Maria; Azariadis, Kalliopi; Kampouri, Errika; Theodoropoulou, Katerina; Lavrentaki, Katerina; Kastrinakis, Stelios; Kampa, Marilena; Agouridakis, Panagiotis; Pirintsos, Stergios; Castanas, Elias

    2015-01-01

    Severe allergic reactions of unknown etiology,necessitating a hospital visit, have an important impact in the life of affected individuals and impose a major economic burden to societies. The prediction of clinically severe allergic reactions would be of great importance, but current attempts have been limited by the lack of a well-founded applicable methodology and the wide spatiotemporal distribution of allergic reactions. The valid prediction of severe allergies (and especially those needing hospital treatment) in a region, could alert health authorities and implicated individuals to take appropriate preemptive measures. In the present report we have collecterd visits for serious allergic reactions of unknown etiology from two major hospitals in the island of Crete, for two distinct time periods (validation and test sets). We have used the Normalized Difference Vegetation Index (NDVI), a satellite-based, freely available measurement, which is an indicator of live green vegetation at a given geographic area, and a set of meteorological data to develop a model capable of describing and predicting severe allergic reaction frequency. Our analysis has retained NDVI and temperature as accurate identifiers and predictors of increased hospital severe allergic reactions visits. Our approach may contribute towards the development of satellite-based modules, for the prediction of severe allergic reactions in specific, well-defined geographical areas. It could also probably be used for the prediction of other environment related diseases and conditions.

  2. Microstructure-Dependent Gas Adsorption: Accurate Predictions of Methane Uptake in Nanoporous Carbons

    SciTech Connect

    Ihm, Yungok; Cooper, Valentino R; Gallego, Nidia C; Contescu, Cristian I; Morris, James R

    2014-01-01

    We demonstrate a successful, efficient framework for predicting gas adsorption properties in real materials based on first-principles calculations, with a specific comparison of experiment and theory for methane adsorption in activated carbons. These carbon materials have different pore size distributions, leading to a variety of uptake characteristics. Utilizing these distributions, we accurately predict experimental uptakes and heats of adsorption without empirical potentials or lengthy simulations. We demonstrate that materials with smaller pores have higher heats of adsorption, leading to a higher gas density in these pores. This pore-size dependence must be accounted for, in order to predict and understand the adsorption behavior. The theoretical approach combines: (1) ab initio calculations with a van der Waals density functional to determine adsorbent-adsorbate interactions, and (2) a thermodynamic method that predicts equilibrium adsorption densities by directly incorporating the calculated potential energy surface in a slit pore model. The predicted uptake at P=20 bar and T=298 K is in excellent agreement for all five activated carbon materials used. This approach uses only the pore-size distribution as an input, with no fitting parameters or empirical adsorbent-adsorbate interactions, and thus can be easily applied to other adsorbent-adsorbate combinations.

  3. Accurate Prediction of Severe Allergic Reactions by a Small Set of Environmental Parameters (NDVI, Temperature)

    PubMed Central

    Andrianaki, Maria; Azariadis, Kalliopi; Kampouri, Errika; Theodoropoulou, Katerina; Lavrentaki, Katerina; Kastrinakis, Stelios; Kampa, Marilena; Agouridakis, Panagiotis; Pirintsos, Stergios; Castanas, Elias

    2015-01-01

    Severe allergic reactions of unknown etiology,necessitating a hospital visit, have an important impact in the life of affected individuals and impose a major economic burden to societies. The prediction of clinically severe allergic reactions would be of great importance, but current attempts have been limited by the lack of a well-founded applicable methodology and the wide spatiotemporal distribution of allergic reactions. The valid prediction of severe allergies (and especially those needing hospital treatment) in a region, could alert health authorities and implicated individuals to take appropriate preemptive measures. In the present report we have collecterd visits for serious allergic reactions of unknown etiology from two major hospitals in the island of Crete, for two distinct time periods (validation and test sets). We have used the Normalized Difference Vegetation Index (NDVI), a satellite-based, freely available measurement, which is an indicator of live green vegetation at a given geographic area, and a set of meteorological data to develop a model capable of describing and predicting severe allergic reaction frequency. Our analysis has retained NDVI and temperature as accurate identifiers and predictors of increased hospital severe allergic reactions visits. Our approach may contribute towards the development of satellite-based modules, for the prediction of severe allergic reactions in specific, well-defined geographical areas. It could also probably be used for the prediction of other environment related diseases and conditions. PMID:25794106

  4. Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network

    NASA Astrophysics Data System (ADS)

    Ben Ali, Jaouher; Chebel-Morello, Brigitte; Saidi, Lotfi; Malinowski, Simon; Fnaiech, Farhat

    2015-05-01

    Accurate remaining useful life (RUL) prediction of critical assets is an important challenge in condition based maintenance to improve reliability and decrease machine's breakdown and maintenance's cost. Bearing is one of the most important components in industries which need to be monitored and the user should predict its RUL. The challenge of this study is to propose an original feature able to evaluate the health state of bearings and to estimate their RUL by Prognostics and Health Management (PHM) techniques. In this paper, the proposed method is based on the data-driven prognostic approach. The combination of Simplified Fuzzy Adaptive Resonance Theory Map (SFAM) neural network and Weibull distribution (WD) is explored. WD is used just in the training phase to fit measurement and to avoid areas of fluctuation in the time domain. SFAM training process is based on fitted measurements at present and previous inspection time points as input. However, the SFAM testing process is based on real measurements at present and previous inspections. Thanks to the fuzzy learning process, SFAM has an important ability and a good performance to learn nonlinear time series. As output, seven classes are defined; healthy bearing and six states for bearing degradation. In order to find the optimal RUL prediction, a smoothing phase is proposed in this paper. Experimental results show that the proposed method can reliably predict the RUL of rolling element bearings (REBs) based on vibration signals. The proposed prediction approach can be applied to prognostic other various mechanical assets.

  5. Accurate verification of the conserved-vector-current and standard-model predictions

    SciTech Connect

    Sirlin, A.; Zucchini, R.

    1986-10-20

    An approximate analytic calculation of O(Z..cap alpha../sup 2/) corrections to Fermi decays is presented. When the analysis of Koslowsky et al. is modified to take into account the new results, it is found that each of the eight accurately studied scrFt values differs from the average by approx. <1sigma, thus significantly improving the comparison of experiments with conserved-vector-current predictions. The new scrFt values are lower than before, which also brings experiments into very good agreement with the three-generation standard model, at the level of its quantum corrections.

  6. Special purpose hybrid transfinite elements and unified computational methodology for accurately predicting thermoelastic stress waves

    NASA Technical Reports Server (NTRS)

    Tamma, Kumar K.; Railkar, Sudhir B.

    1988-01-01

    This paper represents an attempt to apply extensions of a hybrid transfinite element computational approach for accurately predicting thermoelastic stress waves. The applicability of the present formulations for capturing the thermal stress waves induced by boundary heating for the well known Danilovskaya problems is demonstrated. A unique feature of the proposed formulations for applicability to the Danilovskaya problem of thermal stress waves in elastic solids lies in the hybrid nature of the unified formulations and the development of special purpose transfinite elements in conjunction with the classical Galerkin techniques and transformation concepts. Numerical test cases validate the applicability and superior capability to capture the thermal stress waves induced due to boundary heating.

  7. The MIDAS touch for Accurately Predicting the Stress-Strain Behavior of Tantalum

    SciTech Connect

    Jorgensen, S.

    2016-03-02

    Testing the behavior of metals in extreme environments is not always feasible, so material scientists use models to try and predict the behavior. To achieve accurate results it is necessary to use the appropriate model and material-specific parameters. This research evaluated the performance of six material models available in the MIDAS database [1] to determine at which temperatures and strain-rates they perform best, and to determine to which experimental data their parameters were optimized. Additionally, parameters were optimized for the Johnson-Cook model using experimental data from Lassila et al [2].

  8. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli

    PubMed Central

    Kim, Minseung; Rai, Navneet; Zorraquino, Violeta; Tagkopoulos, Ilias

    2016-01-01

    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery. PMID:27713404

  9. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.

    PubMed

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-02-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.

  10. Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations

    PubMed Central

    Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek

    2016-01-01

    Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-‘one-click’ experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/. PMID:26894674

  11. Accurate prediction of human drug toxicity: a major challenge in drug development.

    PubMed

    Li, Albert P

    2004-11-01

    Over the past decades, a number of drugs have been withdrawn or have required special labeling due to adverse effects observed post-marketing. Species differences in drug toxicity in preclinical safety tests and the lack of sensitive biomarkers and nonrepresentative patient population in clinical trials are probable reasons for the failures in predicting human drug toxicity. It is proposed that toxicology should evolve from an empirical practice to an investigative discipline. Accurate prediction of human drug toxicity requires resources and time to be spent in clearly defining key toxic pathways and corresponding risk factors, which hopefully, will be compensated by the benefits of a lower percentage of clinical failure due to toxicity and a decreased frequency of market withdrawal due to unacceptable adverse drug effects.

  12. Carbene footprinting accurately maps binding sites in protein–ligand and protein–protein interactions

    PubMed Central

    Manzi, Lucio; Barrow, Andrew S.; Scott, Daniel; Layfield, Robert; Wright, Timothy G.; Moses, John E.; Oldham, Neil J.

    2016-01-01

    Specific interactions between proteins and their binding partners are fundamental to life processes. The ability to detect protein complexes, and map their sites of binding, is crucial to understanding basic biology at the molecular level. Methods that employ sensitive analytical techniques such as mass spectrometry have the potential to provide valuable insights with very little material and on short time scales. Here we present a differential protein footprinting technique employing an efficient photo-activated probe for use with mass spectrometry. Using this methodology the location of a carbohydrate substrate was accurately mapped to the binding cleft of lysozyme, and in a more complex example, the interactions between a 100 kDa, multi-domain deubiquitinating enzyme, USP5 and a diubiquitin substrate were located to different functional domains. The much improved properties of this probe make carbene footprinting a viable method for rapid and accurate identification of protein binding sites utilizing benign, near-UV photoactivation. PMID:27848959

  13. Delineation of modular proteins: domain boundary prediction from sequence information.

    PubMed

    Kong, Lesheng; Ranganathan, Shoba

    2004-06-01

    The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.

  14. Accurate prediction of the response of freshwater fish to a mixture of estrogenic chemicals.

    PubMed

    Brian, Jayne V; Harris, Catherine A; Scholze, Martin; Backhaus, Thomas; Booy, Petra; Lamoree, Marja; Pojana, Giulio; Jonkers, Niels; Runnalls, Tamsin; Bonfà, Angela; Marcomini, Antonio; Sumpter, John P

    2005-06-01

    Existing environmental risk assessment procedures are limited in their ability to evaluate the combined effects of chemical mixtures. We investigated the implications of this by analyzing the combined effects of a multicomponent mixture of five estrogenic chemicals using vitellogenin induction in male fathead minnows as an end point. The mixture consisted of estradiol, ethynylestradiol, nonylphenol, octylphenol, and bisphenol A. We determined concentration-response curves for each of the chemicals individually. The chemicals were then combined at equipotent concentrations and the mixture tested using fixed-ratio design. The effects of the mixture were compared with those predicted by the model of concentration addition using biomathematical methods, which revealed that there was no deviation between the observed and predicted effects of the mixture. These findings demonstrate that estrogenic chemicals have the capacity to act together in an additive manner and that their combined effects can be accurately predicted by concentration addition. We also explored the potential for mixture effects at low concentrations by exposing the fish to each chemical at one-fifth of its median effective concentration (EC50). Individually, the chemicals did not induce a significant response, although their combined effects were consistent with the predictions of concentration addition. This demonstrates the potential for estrogenic chemicals to act additively at environmentally relevant concentrations. These findings highlight the potential for existing environmental risk assessment procedures to underestimate the hazard posed by mixtures of chemicals that act via a similar mode of action, thereby leading to erroneous conclusions of absence of risk.

  15. Accurate Prediction of the Response of Freshwater Fish to a Mixture of Estrogenic Chemicals

    PubMed Central

    Brian, Jayne V.; Harris, Catherine A.; Scholze, Martin; Backhaus, Thomas; Booy, Petra; Lamoree, Marja; Pojana, Giulio; Jonkers, Niels; Runnalls, Tamsin; Bonfà, Angela; Marcomini, Antonio; Sumpter, John P.

    2005-01-01

    Existing environmental risk assessment procedures are limited in their ability to evaluate the combined effects of chemical mixtures. We investigated the implications of this by analyzing the combined effects of a multicomponent mixture of five estrogenic chemicals using vitellogenin induction in male fathead minnows as an end point. The mixture consisted of estradiol, ethynylestradiol, nonylphenol, octylphenol, and bisphenol A. We determined concentration–response curves for each of the chemicals individually. The chemicals were then combined at equipotent concentrations and the mixture tested using fixed-ratio design. The effects of the mixture were compared with those predicted by the model of concentration addition using biomathematical methods, which revealed that there was no deviation between the observed and predicted effects of the mixture. These findings demonstrate that estrogenic chemicals have the capacity to act together in an additive manner and that their combined effects can be accurately predicted by concentration addition. We also explored the potential for mixture effects at low concentrations by exposing the fish to each chemical at one-fifth of its median effective concentration (EC50). Individually, the chemicals did not induce a significant response, although their combined effects were consistent with the predictions of concentration addition. This demonstrates the potential for estrogenic chemicals to act additively at environmentally relevant concentrations. These findings highlight the potential for existing environmental risk assessment procedures to underestimate the hazard posed by mixtures of chemicals that act via a similar mode of action, thereby leading to erroneous conclusions of absence of risk. PMID:15929895

  16. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

    PubMed

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H

    2017-01-09

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.

  17. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

    PubMed Central

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.

    2017-01-01

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623

  18. Integrative subcellular proteomic analysis allows accurate prediction of human disease-causing genes

    PubMed Central

    Zhao, Li; Chen, Yiyun; Bajaj, Amol Onkar; Eblimit, Aiden; Xu, Mingchu; Soens, Zachry T.; Wang, Feng; Ge, Zhongqi; Jung, Sung Yun; He, Feng; Li, Yumei; Wensel, Theodore G.; Qin, Jun; Chen, Rui

    2016-01-01

    Proteomic profiling on subcellular fractions provides invaluable information regarding both protein abundance and subcellular localization. When integrated with other data sets, it can greatly enhance our ability to predict gene function genome-wide. In this study, we performed a comprehensive proteomic analysis on the light-sensing compartment of photoreceptors called the outer segment (OS). By comparing with the protein profile obtained from the retina tissue depleted of OS, an enrichment score for each protein is calculated to quantify protein subcellular localization, and 84% accuracy is achieved compared with experimental data. By integrating the protein OS enrichment score, the protein abundance, and the retina transcriptome, the probability of a gene playing an essential function in photoreceptor cells is derived with high specificity and sensitivity. As a result, a list of genes that will likely result in human retinal disease when mutated was identified and validated by previous literature and/or animal model studies. Therefore, this new methodology demonstrates the synergy of combining subcellular fractionation proteomics with other omics data sets and is generally applicable to other tissues and diseases. PMID:26912414

  19. ILT based defect simulation of inspection images accurately predicts mask defect printability on wafer

    NASA Astrophysics Data System (ADS)

    Deep, Prakash; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2016-05-01

    At advanced technology nodes mask complexity has been increased because of large-scale use of resolution enhancement technologies (RET) which includes Optical Proximity Correction (OPC), Inverse Lithography Technology (ILT) and Source Mask Optimization (SMO). The number of defects detected during inspection of such mask increased drastically and differentiation of critical and non-critical defects are more challenging, complex and time consuming. Because of significant defectivity of EUVL masks and non-availability of actinic inspection, it is important and also challenging to predict the criticality of defects for printability on wafer. This is one of the significant barriers for the adoption of EUVL for semiconductor manufacturing. Techniques to decide criticality of defects from images captured using non actinic inspection images is desired till actinic inspection is not available. High resolution inspection of photomask images detects many defects which are used for process and mask qualification. Repairing all defects is not practical and probably not required, however it's imperative to know which defects are severe enough to impact wafer before repair. Additionally, wafer printability check is always desired after repairing a defect. AIMSTM review is the industry standard for this, however doing AIMSTM review for all defects is expensive and very time consuming. Fast, accurate and an economical mechanism is desired which can predict defect printability on wafer accurately and quickly from images captured using high resolution inspection machine. Predicting defect printability from such images is challenging due to the fact that the high resolution images do not correlate with actual mask contours. The challenge is increased due to use of different optical condition during inspection other than actual scanner condition, and defects found in such images do not have correlation with actual impact on wafer. Our automated defect simulation tool predicts

  20. Predicting Turns in Proteins with a Unified Model

    PubMed Central

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872

  1. Computational Prediction of Protein-Protein Interactions of Human Tyrosinase

    PubMed Central

    Wang, Su-Fang; Oh, Sangho; Si, Yue-Xiu; Wang, Zhi-Jiang; Han, Hong-Yan; Lee, Jinhyuk; Qian, Guo-Ying

    2012-01-01

    The various studies on tyrosinase have recently gained the attention of researchers due to their potential application values and the biological functions. In this study, we predicted the 3D structure of human tyrosinase and simulated the protein-protein interactions between tyrosinase and three binding partners, four and half LIM domains 2 (FHL2), cytochrome b-245 alpha polypeptide (CYBA), and RNA-binding motif protein 9 (RBM9). Our interaction simulations showed significant binding energy scores of −595.3 kcal/mol for FHL2, −859.1 kcal/mol for CYBA, and −821.3 kcal/mol for RBM9. We also investigated the residues of each protein facing toward the predicted site of interaction with tyrosinase. Our computational predictions will be useful for elucidating the protein-protein interactions of tyrosinase and studying its binding mechanisms. PMID:22577521

  2. Predicting the fission yeast protein interaction network.

    PubMed

    Pancaldi, Vera; Saraç, Omer S; Rallis, Charalampos; McLean, Janel R; Převorovský, Martin; Gould, Kathleen; Beyer, Andreas; Bähler, Jürg

    2012-04-01

    A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein-protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70-80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt).

  3. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    PubMed

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  4. A hierarchical approach to accurate predictions of macroscopic thermodynamic behavior from quantum mechanics and molecular simulations

    NASA Astrophysics Data System (ADS)

    Garrison, Stephen L.

    2005-07-01

    The combination of molecular simulations and potentials obtained from quantum chemistry is shown to be able to provide reasonably accurate thermodynamic property predictions. Gibbs ensemble Monte Carlo simulations are used to understand the effects of small perturbations to various regions of the model Lennard-Jones 12-6 potential. However, when the phase behavior and second virial coefficient are scaled by the critical properties calculated for each potential, the results obey a corresponding states relation suggesting a non-uniqueness problem for interaction potentials fit to experimental phase behavior. Several variations of a procedure collectively referred to as quantum mechanical Hybrid Methods for Interaction Energies (HM-IE) are developed and used to accurately estimate interaction energies from CCSD(T) calculations with a large basis set in a computationally efficient manner for the neon-neon, acetylene-acetylene, and nitrogen-benzene systems. Using these results and methods, an ab initio, pairwise-additive, site-site potential for acetylene is determined and then improved using results from molecular simulations using this initial potential. The initial simulation results also indicate that a limited range of energies important for accurate phase behavior predictions. Second virial coefficients calculated from the improved potential indicate that one set of experimental data in the literature is likely erroneous. This prescription is then applied to methanethiol. Difficulties in modeling the effects of the lone pair electrons suggest that charges on the lone pair sites negatively impact the ability of the intermolecular potential to describe certain orientations, but that the lone pair sites may be necessary to reasonably duplicate the interaction energies for several orientations. Two possible methods for incorporating the effects of three-body interactions into simulations within the pairwise-additivity formulation are also developed. A low density

  5. Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method.

    PubMed

    Ho, Shinn-Ying; Yu, Fu-Chieh; Chang, Chia-Yun; Huang, Hui-Ling

    2007-01-01

    In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.

  6. Enhancing interacting residue prediction with integrated contact matrix prediction in protein-protein interaction.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H

    2016-12-01

    Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as "feedback" to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.

  7. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water

    NASA Astrophysics Data System (ADS)

    Shvab, I.; Sadus, Richard J.

    2013-11-01

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g/cm3 for a wide range of temperatures (298-650 K) and pressures (0.1-700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC/E and TIP4P/2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC/E and TIP4P/2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  8. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water.

    PubMed

    Shvab, I; Sadus, Richard J

    2013-11-21

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g∕cm(3) for a wide range of temperatures (298-650 K) and pressures (0.1-700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC∕E and TIP4P∕2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC∕E and TIP4P∕2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  9. Intermolecular potentials and the accurate prediction of the thermodynamic properties of water

    SciTech Connect

    Shvab, I.; Sadus, Richard J.

    2013-11-21

    The ability of intermolecular potentials to correctly predict the thermodynamic properties of liquid water at a density of 0.998 g/cm{sup 3} for a wide range of temperatures (298–650 K) and pressures (0.1–700 MPa) is investigated. Molecular dynamics simulations are reported for the pressure, thermal pressure coefficient, thermal expansion coefficient, isothermal and adiabatic compressibilities, isobaric and isochoric heat capacities, and Joule-Thomson coefficient of liquid water using the non-polarizable SPC/E and TIP4P/2005 potentials. The results are compared with both experiment data and results obtained from the ab initio-based Matsuoka-Clementi-Yoshimine non-additive (MCYna) [J. Li, Z. Zhou, and R. J. Sadus, J. Chem. Phys. 127, 154509 (2007)] potential, which includes polarization contributions. The data clearly indicate that both the SPC/E and TIP4P/2005 potentials are only in qualitative agreement with experiment, whereas the polarizable MCYna potential predicts some properties within experimental uncertainty. This highlights the importance of polarizability for the accurate prediction of the thermodynamic properties of water, particularly at temperatures beyond 298 K.

  10. Predicting membrane protein types with bragging learner.

    PubMed

    Niu, Bing; Jin, Yu-Huan; Feng, Kai-Yan; Liu, Liang; Lu, Wen-Cong; Cai, Yu-Dong; Li, Guo-Zheng

    2008-01-01

    The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at http://chemdata.shu.edu.cn/protein/protein.jsp.

  11. Predicting the Fission Yeast Protein Interaction Network

    PubMed Central

    Pancaldi, Vera; Saraç, Ömer S.; Rallis, Charalampos; McLean, Janel R.; Převorovský, Martin; Gould, Kathleen; Beyer, Andreas; Bähler, Jürg

    2012-01-01

    A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein–protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70–80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt). PMID:22540037

  12. CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

    PubMed Central

    Karim, Rezaul; Aziz, Mohd. Momin Al; Shatabda, Swakkhar; Rahman, M. Sohel; Mia, Md. Abul Kashem; Zaman, Farhana; Rakin, Salman

    2015-01-01

    The number of entries in a structural database of proteins is increasing day by day. Methods for retrieving protein tertiary structures from such a large database have turn out to be the key to comparative analysis of structures that plays an important role to understand proteins and their functions. In this paper, we present fast and accurate methods for the retrieval of proteins having tertiary structures similar to a query protein from a large database. Our proposed methods borrow ideas from the field of computer vision. The speed and accuracy of our methods come from the two newly introduced features- the co-occurrence matrix of the oriented gradient and pyramid histogram of oriented gradient- and the use of Euclidean distance as the distance measure. Experimental results clearly indicate the superiority of our approach in both running time and accuracy. Our method is readily available for use from this website: http://research.buet.ac.bd:8080/Comograd/. PMID:26293226

  13. Distance scaling method for accurate prediction of slowly varying magnetic fields in satellite missions

    NASA Astrophysics Data System (ADS)

    Zacharias, Panagiotis P.; Chatzineofytou, Elpida G.; Spantideas, Sotirios T.; Capsalis, Christos N.

    2016-07-01

    In the present work, the determination of the magnetic behavior of localized magnetic sources from near-field measurements is examined. The distance power law of the magnetic field fall-off is used in various cases to accurately predict the magnetic signature of an equipment under test (EUT) consisting of multiple alternating current (AC) magnetic sources. Therefore, parameters concerning the location of the observation points (magnetometers) are studied towards this scope. The results clearly show that these parameters are independent of the EUT's size and layout. Additionally, the techniques developed in the present study enable the placing of the magnetometers close to the EUT, thus achieving high signal-to-noise ratio (SNR). Finally, the proposed method is verified by real measurements, using a mobile phone as an EUT.

  14. A fast and accurate method to predict 2D and 3D aerodynamic boundary layer flows

    NASA Astrophysics Data System (ADS)

    Bijleveld, H. A.; Veldman, A. E. P.

    2014-12-01

    A quasi-simultaneous interaction method is applied to predict 2D and 3D aerodynamic flows. This method is suitable for offshore wind turbine design software as it is a very accurate and computationally reasonably cheap method. This study shows the results for a NACA 0012 airfoil. The two applied solvers converge to the experimental values when the grid is refined. We also show that in separation the eigenvalues remain positive thus avoiding the Goldstein singularity at separation. In 3D we show a flow over a dent in which separation occurs. A rotating flat plat is used to show the applicability of the method for rotating flows. The shown capabilities of the method indicate that the quasi-simultaneous interaction method is suitable for design methods for offshore wind turbine blades.

  15. Exchange-Hole Dipole Dispersion Model for Accurate Energy Ranking in Molecular Crystal Structure Prediction.

    PubMed

    Whittleton, Sarah R; Otero-de-la-Roza, A; Johnson, Erin R

    2017-02-14

    Accurate energy ranking is a key facet to the problem of first-principles crystal-structure prediction (CSP) of molecular crystals. This work presents a systematic assessment of B86bPBE-XDM, a semilocal density functional combined with the exchange-hole dipole moment (XDM) dispersion model, for energy ranking using 14 compounds from the first five CSP blind tests. Specifically, the set of crystals studied comprises 11 rigid, planar compounds and 3 co-crystals. The experimental structure was correctly identified as the lowest in lattice energy for 12 of the 14 total crystals. One of the exceptions is 4-hydroxythiophene-2-carbonitrile, for which the experimental structure was correctly identified once a quasi-harmonic estimate of the vibrational free-energy contribution was included, evidencing the occasional importance of thermal corrections for accurate energy ranking. The other exception is an organic salt, where charge-transfer error (also called delocalization error) is expected to cause the base density functional to be unreliable. Provided the choice of base density functional is appropriate and an estimate of temperature effects is used, XDM-corrected density-functional theory is highly reliable for the energetic ranking of competing crystal structures.

  16. Year 2 Report: Protein Function Prediction Platform

    SciTech Connect

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  17. Predicting protein-peptide interactions from scratch

    NASA Astrophysics Data System (ADS)

    Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin; Zou lab Team

    Protein-peptide interactions play an important role in many cellular processes. The ability to predict protein-peptide complex structures is valuable for mechanistic investigation and therapeutic development. Due to the high flexibility of peptides and lack of templates for homologous modeling, predicting protein-peptide complex structures is extremely challenging. Recently, we have developed a novel docking framework for protein-peptide structure prediction. Specifically, given the sequence of a peptide and a 3D structure of the protein, initial conformations of the peptide are built through protein threading. Then, the peptide is globally and flexibly docked onto the protein using a novel iterative approach. Finally, the sampled modes are scored and ranked by a statistical potential-based energy scoring function that was derived for protein-peptide interactions from statistical mechanics principles. Our docking methodology has been tested on the Peptidb database and compared with other protein-peptide docking methods. Systematic analysis shows significantly improved results compared to the performances of the existing methods. Our method is computationally efficient and suitable for large-scale applications. Nsf CAREER Award 0953839 (XZ) NIH R01GM109980 (XZ).

  18. Measuring solar reflectance Part I: Defining a metric that accurately predicts solar heat gain

    SciTech Connect

    Levinson, Ronnen; Akbari, Hashem; Berdahl, Paul

    2010-05-14

    Solar reflectance can vary with the spectral and angular distributions of incident sunlight, which in turn depend on surface orientation, solar position and atmospheric conditions. A widely used solar reflectance metric based on the ASTM Standard E891 beam-normal solar spectral irradiance underestimates the solar heat gain of a spectrally selective 'cool colored' surface because this irradiance contains a greater fraction of near-infrared light than typically found in ordinary (unconcentrated) global sunlight. At mainland U.S. latitudes, this metric RE891BN can underestimate the annual peak solar heat gain of a typical roof or pavement (slope {le} 5:12 [23{sup o}]) by as much as 89 W m{sup -2}, and underestimate its peak surface temperature by up to 5 K. Using R{sub E891BN} to characterize roofs in a building energy simulation can exaggerate the economic value N of annual cool-roof net energy savings by as much as 23%. We define clear-sky air mass one global horizontal ('AM1GH') solar reflectance R{sub g,0}, a simple and easily measured property that more accurately predicts solar heat gain. R{sub g,0} predicts the annual peak solar heat gain of a roof or pavement to within 2 W m{sup -2}, and overestimates N by no more than 3%. R{sub g,0} is well suited to rating the solar reflectances of roofs, pavements and walls. We show in Part II that R{sub g,0} can be easily and accurately measured with a pyranometer, a solar spectrophotometer or version 6 of the Solar Spectrum Reflectometer.

  19. Accurate prediction of wall shear stress in a stented artery: newtonian versus non-newtonian models.

    PubMed

    Mejia, Juan; Mongrain, Rosaire; Bertrand, Olivier F

    2011-07-01

    A significant amount of evidence linking wall shear stress to neointimal hyperplasia has been reported in the literature. As a result, numerical and experimental models have been created to study the influence of stent design on wall shear stress. Traditionally, blood has been assumed to behave as a Newtonian fluid, but recently that assumption has been challenged. The use of a linear model; however, can reduce computational cost, and allow the use of Newtonian fluids (e.g., glycerine and water) instead of a blood analog fluid in an experimental setup. Therefore, it is of interest whether a linear model can be used to accurately predict the wall shear stress caused by a non-Newtonian fluid such as blood within a stented arterial segment. The present work compares the resulting wall shear stress obtained using two linear and one nonlinear model under the same flow waveform. All numerical models are fully three-dimensional, transient, and incorporate a realistic stent geometry. It is shown that traditional linear models (based on blood's lowest viscosity limit, 3.5 Pa s) underestimate the wall shear stress within a stented arterial segment, which can lead to an overestimation of the risk of restenosis. The second linear model, which uses a characteristic viscosity (based on an average strain rate, 4.7 Pa s), results in higher wall shear stress levels, but which are still substantially below those of the nonlinear model. It is therefore shown that nonlinear models result in more accurate predictions of wall shear stress within a stented arterial segment.

  20. Point-of-care cardiac troponin test accurately predicts heat stroke severity in rats.

    PubMed

    Audet, Gerald N; Quinn, Carrie M; Leon, Lisa R

    2015-11-15

    Heat stroke (HS) remains a significant public health concern. Despite the substantial threat posed by HS, there is still no field or clinical test of HS severity. We suggested previously that circulating cardiac troponin (cTnI) could serve as a robust biomarker of HS severity after heating. In the present study, we hypothesized that (cTnI) point-of-care test (ctPOC) could be used to predict severity and organ damage at the onset of HS. Conscious male Fischer 344 rats (n = 16) continuously monitored for heart rate (HR), blood pressure (BP), and core temperature (Tc) (radiotelemetry) were heated to maximum Tc (Tc,Max) of 41.9 ± 0.1°C and recovered undisturbed for 24 h at an ambient temperature of 20°C. Blood samples were taken at Tc,Max and 24 h after heat via submandibular bleed and analyzed on ctPOC test. POC cTnI band intensity was ranked using a simple four-point scale via two blinded observers and compared with cTnI levels measured by a clinical blood analyzer. Blood was also analyzed for biomarkers of systemic organ damage. HS severity, as previously defined using HR, BP, and recovery Tc profile during heat exposure, correlated strongly with cTnI (R(2) = 0.69) at Tc,Max. POC cTnI band intensity ranking accurately predicted cTnI levels (R(2) = 0.64) and HS severity (R(2) = 0.83). Five markers of systemic organ damage also correlated with ctPOC score (albumin, alanine aminotransferase, blood urea nitrogen, cholesterol, and total bilirubin; R(2) > 0.4). This suggests that cTnI POC tests can accurately determine HS severity and could serve as simple, portable, cost-effective HS field tests.

  1. Chemical shift prediction for denatured proteins.

    PubMed

    Prestegard, James H; Sahu, Sarata C; Nkari, Wendy K; Morris, Laura C; Live, David; Gruta, Christian

    2013-02-01

    While chemical shift prediction has played an important role in aspects of protein NMR that include identification of secondary structure, generation of torsion angle constraints for structure determination, and assignment of resonances in spectra of intrinsically disordered proteins, interest has arisen more recently in using it in alternate assignment strategies for crosspeaks in (1)H-(15)N HSQC spectra of sparsely labeled proteins. One such approach involves correlation of crosspeaks in the spectrum of the native protein with those observed in the spectrum of the denatured protein, followed by assignment of the peaks in the latter spectrum. As in the case of disordered proteins, predicted chemical shifts can aid in these assignments. Some previously developed empirical formulas for chemical shift prediction have depended on basis data sets of 20 pentapeptides. In each case the central residue was varied among the 20 amino common acids, with the flanking residues held constant throughout the given series. However, previous choices of solvent conditions and flanking residues make the parameters in these formulas less than ideal for general application to denatured proteins. Here, we report (1)H and (15)N shifts for a set of alanine based pentapeptides under the low pH urea denaturing conditions that are more appropriate for sparse label assignments. New parameters have been derived and a Perl script was created to facilitate comparison with other parameter sets. A small, but significant, improvement in shift predictions for denatured ubiquitin is demonstrated.

  2. PSAQ™ standards for accurate MS-based quantification of proteins: from the concept to biomedical applications.

    PubMed

    Picard, Guillaume; Lebert, Dorothée; Louwagie, Mathilde; Adrait, Annie; Huillet, Céline; Vandenesch, François; Bruley, Christophe; Garin, Jérôme; Jaquinod, Michel; Brun, Virginie

    2012-10-01

    Absolute protein quantification, i.e. determining protein concentrations in biological samples, is essential to our understanding of biological and physiopathological phenomena. Protein quantification methods based on the use of antibodies are very effective and widely used. However, over the last ten years, absolute protein quantification by mass spectrometry has attracted considerable interest, particularly for the study of systems biology and as part of biomarker development. This interest is mainly linked to the high multiplexing capacity of MS analysis, and to the availability of stable-isotope-labelled standards for quantification. This article describes the details of how to produce, control the quality and use a specific type of standard: Protein Standard Absolute Quantification (PSAQ™) standards. These standards are whole isotopically labelled proteins, analogues of the proteins to be assayed. PSAQ standards can be added early during sample treatment, thus they can correct for protein losses during sample prefractionation and for incomplete sample digestion. Because of this, quantification of target proteins is very accurate and precise using these standards. To illustrate the advantages of the PSAQ method, and to contribute to the increase in its use, selected applications in the biomedical field are detailed here.

  3. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care.

    PubMed

    Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer

    2017-04-01

    Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.

  4. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina.

    PubMed

    Maturana, Matias I; Apollo, Nicholas V; Hadjinicolaou, Alex E; Garrett, David J; Cloherty, Shaun L; Kameneva, Tatiana; Grayden, David B; Ibbotson, Michael R; Meffin, Hamish

    2016-04-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron's electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy.

  5. A Simple and Accurate Model to Predict Responses to Multi-electrode Stimulation in the Retina

    PubMed Central

    Maturana, Matias I.; Apollo, Nicholas V.; Hadjinicolaou, Alex E.; Garrett, David J.; Cloherty, Shaun L.; Kameneva, Tatiana; Grayden, David B.; Ibbotson, Michael R.; Meffin, Hamish

    2016-01-01

    Implantable electrode arrays are widely used in therapeutic stimulation of the nervous system (e.g. cochlear, retinal, and cortical implants). Currently, most neural prostheses use serial stimulation (i.e. one electrode at a time) despite this severely limiting the repertoire of stimuli that can be applied. Methods to reliably predict the outcome of multi-electrode stimulation have not been available. Here, we demonstrate that a linear-nonlinear model accurately predicts neural responses to arbitrary patterns of stimulation using in vitro recordings from single retinal ganglion cells (RGCs) stimulated with a subretinal multi-electrode array. In the model, the stimulus is projected onto a low-dimensional subspace and then undergoes a nonlinear transformation to produce an estimate of spiking probability. The low-dimensional subspace is estimated using principal components analysis, which gives the neuron’s electrical receptive field (ERF), i.e. the electrodes to which the neuron is most sensitive. Our model suggests that stimulation proportional to the ERF yields a higher efficacy given a fixed amount of power when compared to equal amplitude stimulation on up to three electrodes. We find that the model captures the responses of all the cells recorded in the study, suggesting that it will generalize to most cell types in the retina. The model is computationally efficient to evaluate and, therefore, appropriate for future real-time applications including stimulation strategies that make use of recorded neural activity to improve the stimulation strategy. PMID:27035143

  6. Fast and accurate pressure-drop prediction in straightened atherosclerotic coronary arteries.

    PubMed

    Schrauwen, Jelle T C; Koeze, Dion J; Wentzel, Jolanda J; van de Vosse, Frans N; van der Steen, Anton F W; Gijsen, Frank J H

    2015-01-01

    Atherosclerotic disease progression in coronary arteries is influenced by wall shear stress. To compute patient-specific wall shear stress, computational fluid dynamics (CFD) is required. In this study we propose a method for computing the pressure-drop in regions proximal and distal to a plaque, which can serve as a boundary condition in CFD. As a first step towards exploring the proposed method we investigated ten straightened coronary arteries. First, the flow fields were calculated with CFD and velocity profiles were fitted on the results. Second, the Navier-Stokes equation was simplified and solved with the found velocity profiles to obtain a pressure-drop estimate (Δp (1)). Next, Δp (1) was compared to the pressure-drop from CFD (Δp CFD) as a validation step. Finally, the velocity profiles, and thus the pressure-drop were predicted based on geometry and flow, resulting in Δp geom. We found that Δp (1) adequately estimated Δp CFD with velocity profiles that have one free parameter β. This β was successfully related to geometry and flow, resulting in an excellent agreement between Δp CFD and Δp geom: 3.9 ± 4.9% difference at Re = 150. We showed that this method can quickly and accurately predict pressure-drop on the basis of geometry and flow in straightened coronary arteries that are mildly diseased.

  7. Accurate load prediction by BEM with airfoil data from 3D RANS simulations

    NASA Astrophysics Data System (ADS)

    Schneider, Marc S.; Nitzsche, Jens; Hennings, Holger

    2016-09-01

    In this paper, two methods for the extraction of airfoil coefficients from 3D CFD simulations of a wind turbine rotor are investigated, and these coefficients are used to improve the load prediction of a BEM code. The coefficients are extracted from a number of steady RANS simulations, using either averaging of velocities in annular sections, or an inverse BEM approach for determination of the induction factors in the rotor plane. It is shown that these 3D rotor polars are able to capture the rotational augmentation at the inner part of the blade as well as the load reduction by 3D effects close to the blade tip. They are used as input to a simple BEM code and the results of this BEM with 3D rotor polars are compared to the predictions of BEM with 2D airfoil coefficients plus common empirical corrections for stall delay and tip loss. While BEM with 2D airfoil coefficients produces a very different radial distribution of loads than the RANS simulation, the BEM with 3D rotor polars manages to reproduce the loads from RANS very accurately for a variety of load cases, as long as the blade pitch angle is not too different from the cases from which the polars were extracted.

  8. Accurate Mass Assignment of Native Protein Complexes Detected by Electrospray Mass Spectrometry

    PubMed Central

    Liepold, Lars O.; Oltrogge, Luke M.; Suci, Peter; Douglas, Trevor; Young, Mark J.

    2009-01-01

    Correct charge state assignment is crucial to assigning an accurate mass to supramolecular complexes analyzed by electrospray mass spectrometry. Conventional charge state assignment techniques fall short of reliably and unambiguously predicting the correct charge state for many supramolecular complexes. We provide an explanation of the shortcomings of the conventional techniques and have developed a robust charge state assignment method that is applicable to all spectra. PMID:19103497

  9. Predicting protein folds with fold-specific PSSM libraries.

    PubMed

    Hong, Yoojin; Chintapalli, Sree Vamsee; Ko, Kyung Dae; Bhardwaj, Gaurav; Zhang, Zhenhai; van Rossum, Damian; Patterson, Randen L

    2011-01-01

    Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies. Herein, we outline an effective method for fold recognition using sets of PSSMs, each of which is constructed for different protein folds. Our analyses demonstrate that FSL (Fold-specific Position Specific Scoring Matrix Libraries) can predict/relate structures given only their amino acid sequences of highly divergent proteins. This ability to detect distant relationships is dependent on low-identity sequence alignments obtained from FSL. Results from our experiments demonstrate that FSL perform well in recognizing folds from the "twilight-zone" SABmark dataset. Further, this method is capable of accurate fold prediction in newly determined structures. We suggest that by building complete PSSM libraries for all unique folds within the Protein Database (PDB), FSL can be used to rapidly and reliably annotate a large subset of protein folds at proteomic level. The related programs and fold-specific PSSMs for our FSL are publicly available at: http://ccp.psu.edu/download/FSLv1.0/.

  10. Accurate First-Principles Spectra Predictions for Planetological and Astrophysical Applications at Various T-Conditions

    NASA Astrophysics Data System (ADS)

    Rey, M.; Nikitin, A. V.; Tyuterev, V.

    2014-06-01

    Knowledge of near infrared intensities of rovibrational transitions of polyatomic molecules is essential for the modeling of various planetary atmospheres, brown dwarfs and for other astrophysical applications 1,2,3. For example, to analyze exoplanets, atmospheric models have been developed, thus making the need to provide accurate spectroscopic data. Consequently, the spectral characterization of such planetary objects relies on the necessity of having adequate and reliable molecular data in extreme conditions (temperature, optical path length, pressure). On the other hand, in the modeling of astrophysical opacities, millions of lines are generally involved and the line-by-line extraction is clearly not feasible in laboratory measurements. It is thus suggested that this large amount of data could be interpreted only by reliable theoretical predictions. There exists essentially two theoretical approaches for the computation and prediction of spectra. The first one is based on empirically-fitted effective spectroscopic models. Another way for computing energies, line positions and intensities is based on global variational calculations using ab initio surfaces. They do not yet reach the spectroscopic accuracy stricto sensu but implicitly account for all intramolecular interactions including resonance couplings in a wide spectral range. The final aim of this work is to provide reliable predictions which could be quantitatively accurate with respect to the precision of available observations and as complete as possible. All this thus requires extensive first-principles quantum mechanical calculations essentially based on three necessary ingredients which are (i) accurate intramolecular potential energy surface and dipole moment surface components well-defined in a large range of vibrational displacements and (ii) efficient computational methods combined with suitable choices of coordinates to account for molecular symmetry properties and to achieve a good numerical

  11. A review of protein function prediction under machine learning perspective.

    PubMed

    Bernardes, Juliana S; Pedreira, Carlos E

    2013-08-01

    Protein function prediction is one of the most challenging problems in the post-genomic era. The number of newly identified proteins has been exponentially increasing with the advances of the high-throughput techniques. However, the functional characterization of these new proteins was not incremented in the same proportion. To fill this gap, a large number of computational methods have been proposed in the literature. Early approaches have explored homology relationships to associate known functions to the newly discovered proteins. Nevertheless, these approaches tend to fail when a new protein is considerably different (divergent) from previously known ones. Accordingly, more accurate approaches, that use expressive data representation and explore sophisticate computational techniques are required. Regarding these points, this review provides a comprehensible description of machine learning approaches that are currently applied to protein function prediction problems. We start by defining several problems enrolled in understanding protein function aspects, and describing how machine learning can be applied to these problems. We aim to expose, in a systematical framework, the role of these techniques in protein function inference, sometimes difficult to follow up due to the rapid evolvement of the field. With this purpose in mind, we highlight the most representative contributions, the recent advancements, and provide an insightful categorization and classification of machine learning methods in functional proteomics.

  12. Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning

    PubMed Central

    Abraham, Gad; Tye-Din, Jason A.; Bhalala, Oneil G.; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2014-01-01

    Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases. PMID:24550740

  13. Energy expenditure during level human walking: seeking a simple and accurate predictive solution.

    PubMed

    Ludlow, Lindsay W; Weyand, Peter G

    2016-03-01

    Accurate prediction of the metabolic energy that walking requires can inform numerous health, bodily status, and fitness outcomes. We adopted a two-step approach to identifying a concise, generalized equation for predicting level human walking metabolism. Using literature-aggregated values we compared 1) the predictive accuracy of three literature equations: American College of Sports Medicine (ACSM), Pandolf et al., and Height-Weight-Speed (HWS); and 2) the goodness-of-fit possible from one- vs. two-component descriptions of walking metabolism. Literature metabolic rate values (n = 127; speed range = 0.4 to 1.9 m/s) were aggregated from 25 subject populations (n = 5-42) whose means spanned a 1.8-fold range of heights and a 4.2-fold range of weights. Population-specific resting metabolic rates (V̇o2 rest) were determined using standardized equations. Our first finding was that the ACSM and Pandolf et al. equations underpredicted nearly all 127 literature-aggregated values. Consequently, their standard errors of estimate (SEE) were nearly four times greater than those of the HWS equation (4.51 and 4.39 vs. 1.13 ml O2·kg(-1)·min(-1), respectively). For our second comparison, empirical best-fit relationships for walking metabolism were derived from the data set in one- and two-component forms for three V̇o2-speed model types: linear (∝V(1.0)), exponential (∝V(2.0)), and exponential/height (∝V(2.0)/Ht). We found that the proportion of variance (R(2)) accounted for, when averaged across the three model types, was substantially lower for one- vs. two-component versions (0.63 ± 0.1 vs. 0.90 ± 0.03) and the predictive errors were nearly twice as great (SEE = 2.22 vs. 1.21 ml O2·kg(-1)·min(-1)). Our final analysis identified the following concise, generalized equation for predicting level human walking metabolism: V̇o2 total = V̇o2 rest + 3.85 + 5.97·V(2)/Ht (where V is measured in m/s, Ht in meters, and V̇o2 in ml O2·kg(-1)·min(-1)).

  14. Using protein binding site prediction to improve protein docking.

    PubMed

    Huang, Bingding; Schroeder, Michael

    2008-10-01

    Predicting protein interaction interfaces and protein complexes are two important related problems. For interface prediction, there are a number of tools, such as PPI-Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. Here, we develop, metaPPI, a meta server for interface prediction. It significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. As shown with Promate, predicted interfaces can be used to improve protein docking. Here, we follow this idea using the meta server instead of individual predictions. We confirm that filtering with predicted interfaces significantly improves candidate generation in rigid-body docking based on shape complementarity. Finally, we show that the initial ranking of candidate solutions in rigid-body docking can be further improved for the class of enzyme-inhibitor complexes by a geometrical scoring which rewards deep pockets. A web server of metaPPI is available at scoppi.tu-dresden.de/metappi. The source code of our docking algorithm BDOCK is also available at www.biotec.tu-dresden.de /approximately bhuang/bdock.

  15. Can radiation therapy treatment planning system accurately predict surface doses in postmastectomy radiation therapy patients?

    SciTech Connect

    Wong, Sharon; Back, Michael; Tan, Poh Wee; Lee, Khai Mun; Baggarley, Shaun; Lu, Jaide Jay

    2012-07-01

    Skin doses have been an important factor in the dose prescription for breast radiotherapy. Recent advances in radiotherapy treatment techniques, such as intensity-modulated radiation therapy (IMRT) and new treatment schemes such as hypofractionated breast therapy have made the precise determination of the surface dose necessary. Detailed information of the dose at various depths of the skin is also critical in designing new treatment strategies. The purpose of this work was to assess the accuracy of surface dose calculation by a clinically used treatment planning system and those measured by thermoluminescence dosimeters (TLDs) in a customized chest wall phantom. This study involved the construction of a chest wall phantom for skin dose assessment. Seven TLDs were distributed throughout each right chest wall phantom to give adequate representation of measured radiation doses. Point doses from the CMS Xio Registered-Sign treatment planning system (TPS) were calculated for each relevant TLD positions and results correlated. There were no significant difference between measured absorbed dose by TLD and calculated doses by the TPS (p > 0.05 (1-tailed). Dose accuracy of up to 2.21% was found. The deviations from the calculated absorbed doses were overall larger (3.4%) when wedges and bolus were used. 3D radiotherapy TPS is a useful and accurate tool to assess the accuracy of surface dose. Our studies have shown that radiation treatment accuracy expressed as a comparison between calculated doses (by TPS) and measured doses (by TLD dosimetry) can be accurately predicted for tangential treatment of the chest wall after mastectomy.

  16. Neural network definitions of highly predictable protein secondary structure classes

    SciTech Connect

    Lapedes, A. |; Steeg, E.; Farber, R.

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  17. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  18. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  19. TIMP2•IGFBP7 biomarker panel accurately predicts acute kidney injury in high-risk surgical patients

    PubMed Central

    Gunnerson, Kyle J.; Shaw, Andrew D.; Chawla, Lakhmir S.; Bihorac, Azra; Al-Khafaji, Ali; Kashani, Kianoush; Lissauer, Matthew; Shi, Jing; Walker, Michael G.; Kellum, John A.

    2016-01-01

    BACKGROUND Acute kidney injury (AKI) is an important complication in surgical patients. Existing biomarkers and clinical prediction models underestimate the risk for developing AKI. We recently reported data from two trials of 728 and 408 critically ill adult patients in whom urinary TIMP2•IGFBP7 (NephroCheck, Astute Medical) was used to identify patients at risk of developing AKI. Here we report a preplanned analysis of surgical patients from both trials to assess whether urinary tissue inhibitor of metalloproteinase 2 (TIMP-2) and insulin-like growth factor–binding protein 7 (IGFBP7) accurately identify surgical patients at risk of developing AKI. STUDY DESIGN We enrolled adult surgical patients at risk for AKI who were admitted to one of 39 intensive care units across Europe and North America. The primary end point was moderate-severe AKI (equivalent to KDIGO [Kidney Disease Improving Global Outcomes] stages 2–3) within 12 hours of enrollment. Biomarker performance was assessed using the area under the receiver operating characteristic curve, integrated discrimination improvement, and category-free net reclassification improvement. RESULTS A total of 375 patients were included in the final analysis of whom 35 (9%) developed moderate-severe AKI within 12 hours. The area under the receiver operating characteristic curve for [TIMP-2]•[IGFBP7] alone was 0.84 (95% confidence interval, 0.76–0.90; p < 0.0001). Biomarker performance was robust in sensitivity analysis across predefined subgroups (urgency and type of surgery). CONCLUSION For postoperative surgical intensive care unit patients, a single urinary TIMP2•IGFBP7 test accurately identified patients at risk for developing AKI within the ensuing 12 hours and its inclusion in clinical risk prediction models significantly enhances their performance. LEVEL OF EVIDENCE Prognostic study, level I. PMID:26816218

  20. Structure-based constitutive model can accurately predict planar biaxial properties of aortic wall tissue.

    PubMed

    Polzer, S; Gasser, T C; Novak, K; Man, V; Tichy, M; Skacel, P; Bursa, J

    2015-03-01

    Structure-based constitutive models might help in exploring mechanisms by which arterial wall histology is linked to wall mechanics. This study aims to validate a recently proposed structure-based constitutive model. Specifically, the model's ability to predict mechanical biaxial response of porcine aortic tissue with predefined collagen structure was tested. Histological slices from porcine thoracic aorta wall (n=9) were automatically processed to quantify the collagen fiber organization, and mechanical testing identified the non-linear properties of the wall samples (n=18) over a wide range of biaxial stretches. Histological and mechanical experimental data were used to identify the model parameters of a recently proposed multi-scale constitutive description for arterial layers. The model predictive capability was tested with respect to interpolation and extrapolation. Collagen in the media was predominantly aligned in circumferential direction (planar von Mises distribution with concentration parameter bM=1.03 ± 0.23), and its coherence decreased gradually from the luminal to the abluminal tissue layers (inner media, b=1.54 ± 0.40; outer media, b=0.72 ± 0.20). In contrast, the collagen in the adventitia was aligned almost isotropically (bA=0.27 ± 0.11), and no features, such as families of coherent fibers, were identified. The applied constitutive model captured the aorta biaxial properties accurately (coefficient of determination R(2)=0.95 ± 0.03) over the entire range of biaxial deformations and with physically meaningful model parameters. Good predictive properties, well outside the parameter identification space, were observed (R(2)=0.92 ± 0.04). Multi-scale constitutive models equipped with realistic micro-histological data can predict macroscopic non-linear aorta wall properties. Collagen largely defines already low strain properties of media, which explains the origin of wall anisotropy seen at this strain level. The structure and mechanical

  1. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  2. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  3. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  4. Predicting accurate fluorescent spectra for high molecular weight polycyclic aromatic hydrocarbons using density functional theory

    NASA Astrophysics Data System (ADS)

    Powell, Jacob; Heider, Emily C.; Campiglia, Andres; Harper, James K.

    2016-10-01

    The ability of density functional theory (DFT) methods to predict accurate fluorescence spectra for polycyclic aromatic hydrocarbons (PAHs) is explored. Two methods, PBE0 and CAM-B3LYP, are evaluated both in the gas phase and in solution. Spectra for several of the most toxic PAHs are predicted and compared to experiment, including three isomers of C24H14 and a PAH containing heteroatoms. Unusually high-resolution experimental spectra are obtained for comparison by analyzing each PAH at 4.2 K in an n-alkane matrix. All theoretical spectra visually conform to the profiles of the experimental data but are systematically offset by a small amount. Specifically, when solvent is included the PBE0 functional overestimates peaks by 16.1 ± 6.6 nm while CAM-B3LYP underestimates the same transitions by 14.5 ± 7.6 nm. These calculated spectra can be empirically corrected to decrease the uncertainties to 6.5 ± 5.1 and 5.7 ± 5.1 nm for the PBE0 and CAM-B3LYP methods, respectively. A comparison of computed spectra in the gas phase indicates that the inclusion of n-octane shifts peaks by +11 nm on average and this change is roughly equivalent for PBE0 and CAM-B3LYP. An automated approach for comparing spectra is also described that minimizes residuals between a given theoretical spectrum and all available experimental spectra. This approach identifies the correct spectrum in all cases and excludes approximately 80% of the incorrect spectra, demonstrating that an automated search of theoretical libraries of spectra may eventually become feasible.

  5. New consensus definition for acute kidney injury accurately predicts 30-day mortality in cirrhosis with infection

    PubMed Central

    Wong, Florence; O’Leary, Jacqueline G; Reddy, K Rajender; Patton, Heather; Kamath, Patrick S; Fallon, Michael B; Garcia-Tsao, Guadalupe; Subramanian, Ram M.; Malik, Raza; Maliakkal, Benedict; Thacker, Leroy R; Bajaj, Jasmohan S

    2015-01-01

    Background & Aims A consensus conference proposed that cirrhosis-associated acute kidney injury (AKI) be defined as an increase in serum creatinine by >50% from the stable baseline value in <6 months or by ≥0.3mg/dL in <48 hrs. We prospectively evaluated the ability of these criteria to predict mortality within 30 days among hospitalized patients with cirrhosis and infection. Methods 337 patients with cirrhosis admitted with or developed an infection in hospital (56% men; 56±10 y old; model for end-stage liver disease score, 20±8) were followed. We compared data on 30-day mortality, hospital length-of-stay, and organ failure between patients with and without AKI. Results 166 (49%) developed AKI during hospitalization, based on the consensus criteria. Patients who developed AKI had higher admission Child-Pugh (11.0±2.1 vs 9.6±2.1; P<.0001), and MELD scores (23±8 vs17±7; P<.0001), and lower mean arterial pressure (81±16mmHg vs 85±15mmHg; P<.01) than those who did not. Also higher amongst patients with AKI were mortality in ≤30 days (34% vs 7%), intensive care unit transfer (46% vs 20%), ventilation requirement (27% vs 6%), and shock (31% vs 8%); AKI patients also had longer hospital stays (17.8±19.8 days vs 13.3±31.8 days) (all P<.001). 56% of AKI episodes were transient, 28% persistent, and 16% resulted in dialysis. Mortality was 80% among those without renal recovery, higher compared to partial (40%) or complete recovery (15%), or AKI-free patients (7%; P<.0001). Conclusions 30-day mortality is 10-fold higher among infected hospitalized cirrhotic patients with irreversible AKI than those without AKI. The consensus definition of AKI accurately predicts 30-day mortality, length of hospital stay, and organ failure. PMID:23999172

  6. Models to predict intestinal absorption of therapeutic peptides and proteins.

    PubMed

    Antunes, Filipa; Andrade, Fernanda; Ferreira, Domingos; Nielsen, Hanne Morck; Sarmento, Bruno

    2013-01-01

    Prediction of human intestinal absorption is a major goal in the design, optimization, and selection of drugs intended for oral delivery, in particular proteins, which possess intrinsic poor transport across intestinal epithelium. There are various techniques currently employed to evaluate the extension of protein absorption in the different phases of drug discovery and development. Screening protocols to evaluate protein absorption include a range of preclinical methodologies like in silico, in vitro, in situ, ex vivo and in vivo. It is the careful and critical use of these techniques that can help to identify drug candidates, which most probably will be well absorbed from the human intestinal tract. It is well recognized that the human intestinal permeability cannot be accurately predicted based on a single preclinical method. However, the present social and scientific concerns about the animal well care as well as the pharmaceutical industries need for rapid, cheap and reliable models predicting bioavailability give reasons for using methods providing an appropriate correlation between results of in vivo and in vitro drug absorption. The aim of this review is to describe and compare in silico, in vitro, in situ, ex vivo and in vivo methods used to predict human intestinal absorption, giving a special attention to the intestinal absorption of therapeutic peptides and proteins.

  7. Fast and Accurate Accessible Surface Area Prediction Without a Sequence Profile.

    PubMed

    Faraggi, Eshel; Kouza, Maksim; Zhou, Yaoqi; Kloczkowski, Andrzej

    2017-01-01

    A fast accessible surface area (ASA) predictor is presented. In this new approach no residue mutation profiles generated by multiple sequence alignments are used as inputs. Instead, we use only single sequence information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for ASAquick are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org .

  8. Sequence features accurately predict genome-wide MeCP2 binding in vivo

    PubMed Central

    Rube, H. Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H.; Hess, John F.; LaSalle, Janine M.; Song, Jun S.; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  9. Fast and Accurate Prediction of Numerical Relativity Waveforms from Binary Black Hole Coalescences Using Surrogate Models.

    PubMed

    Blackman, Jonathan; Field, Scott E; Galley, Chad R; Szilágyi, Béla; Scheel, Mark A; Tiglio, Manuel; Hemberger, Daniel A

    2015-09-18

    Simulating a binary black hole coalescence by solving Einstein's equations is computationally expensive, requiring days to months of supercomputing time. Using reduced order modeling techniques, we construct an accurate surrogate model, which is evaluated in a millisecond to a second, for numerical relativity (NR) waveforms from nonspinning binary black hole coalescences with mass ratios in [1, 10] and durations corresponding to about 15 orbits before merger. We assess the model's uncertainty and show that our modeling strategy predicts NR waveforms not used for the surrogate's training with errors nearly as small as the numerical error of the NR code. Our model includes all spherical-harmonic _{-2}Y_{ℓm} waveform modes resolved by the NR code up to ℓ=8. We compare our surrogate model to effective one body waveforms from 50M_{⊙} to 300M_{⊙} for advanced LIGO detectors and find that the surrogate is always more faithful (by at least an order of magnitude in most cases).

  10. A high order accurate finite element algorithm for high Reynolds number flow prediction

    NASA Technical Reports Server (NTRS)

    Baker, A. J.

    1978-01-01

    A Galerkin-weighted residuals formulation is employed to establish an implicit finite element solution algorithm for generally nonlinear initial-boundary value problems. Solution accuracy, and convergence rate with discretization refinement, are quantized in several error norms, by a systematic study of numerical solutions to several nonlinear parabolic and a hyperbolic partial differential equation characteristic of the equations governing fluid flows. Solutions are generated using selective linear, quadratic and cubic basis functions. Richardson extrapolation is employed to generate a higher-order accurate solution to facilitate isolation of truncation error in all norms. Extension of the mathematical theory underlying accuracy and convergence concepts for linear elliptic equations is predicted for equations characteristic of laminar and turbulent fluid flows at nonmodest Reynolds number. The nondiagonal initial-value matrix structure introduced by the finite element theory is determined intrinsic to improved solution accuracy and convergence. A factored Jacobian iteration algorithm is derived and evaluated to yield a consequential reduction in both computer storage and execution CPU requirements while retaining solution accuracy.

  11. Cluster abundance in chameleon f(R) gravity I: toward an accurate halo mass function prediction

    NASA Astrophysics Data System (ADS)

    Cataneo, Matteo; Rapetti, David; Lombriser, Lucas; Li, Baojiu

    2016-12-01

    We refine the mass and environment dependent spherical collapse model of chameleon f(R) gravity by calibrating a phenomenological correction inspired by the parameterized post-Friedmann framework against high-resolution N-body simulations. We employ our method to predict the corresponding modified halo mass function, and provide fitting formulas to calculate the enhancement of the f(R) halo abundance with respect to that of General Relativity (GR) within a precision of lesssim 5% from the results obtained in the simulations. Similar accuracy can be achieved for the full f(R) mass function on the condition that the modeling of the reference GR abundance of halos is accurate at the percent level. We use our fits to forecast constraints on the additional scalar degree of freedom of the theory, finding that upper bounds competitive with current Solar System tests are within reach of cluster number count analyses from ongoing and upcoming surveys at much larger scales. Importantly, the flexibility of our method allows also for this to be applied to other scalar-tensor theories characterized by a mass and environment dependent spherical collapse.

  12. Accurate prediction of band gaps and optical properties of HfO2

    NASA Astrophysics Data System (ADS)

    Ondračka, Pavel; Holec, David; Nečas, David; Zajíčková, Lenka

    2016-10-01

    We report on optical properties of various polymorphs of hafnia predicted within the framework of density functional theory. The full potential linearised augmented plane wave method was employed together with the Tran-Blaha modified Becke-Johnson potential (TB-mBJ) for exchange and local density approximation for correlation. Unit cells of monoclinic, cubic and tetragonal crystalline, and a simulated annealing-based model of amorphous hafnia were fully relaxed with respect to internal positions and lattice parameters. Electronic structures and band gaps for monoclinic, cubic, tetragonal and amorphous hafnia were calculated using three different TB-mBJ parametrisations and the results were critically compared with the available experimental and theoretical reports. Conceptual differences between a straightforward comparison of experimental measurements to a calculated band gap on the one hand and to a whole electronic structure (density of electronic states) on the other hand, were pointed out, suggesting the latter should be used whenever possible. Finally, dielectric functions were calculated at two levels, using the random phase approximation without local field effects and with a more accurate Bethe-Salpether equation (BSE) to account for excitonic effects. We conclude that a satisfactory agreement with experimental data for HfO2 was obtained only in the latter case.

  13. Accurate prediction of V1 location from cortical folds in a surface coordinate system

    PubMed Central

    Hinds, Oliver P.; Rajendran, Niranjini; Polimeni, Jonathan R.; Augustinack, Jean C.; Wiggins, Graham; Wald, Lawrence L.; Rosas, H. Diana; Potthast, Andreas; Schwartz, Eric L.; Fischl, Bruce

    2008-01-01

    Previous studies demonstrated substantial variability of the location of primary visual cortex (V1) in stereotaxic coordinates when linear volume-based registration is used to match volumetric image intensities (Amunts et al., 2000). However, other qualitative reports of V1 location (Smith, 1904; Stensaas et al., 1974; Rademacher et al., 1993) suggested a consistent relationship between V1 and the surrounding cortical folds. Here, the relationship between folds and the location of V1 is quantified using surface-based analysis to generate a probabilistic atlas of human V1. High-resolution (about 200 μm) magnetic resonance imaging (MRI) at 7 T of ex vivo human cerebral hemispheres allowed identification of the full area via the stria of Gennari: a myeloarchitectonic feature specific to V1. Separate, whole-brain scans were acquired using MRI at 1.5 T to allow segmentation and mesh reconstruction of the cortical gray matter. For each individual, V1 was manually identified in the high-resolution volume and projected onto the cortical surface. Surface-based intersubject registration (Fischl et al., 1999b) was performed to align the primary cortical folds of individual hemispheres to those of a reference template representing the average folding pattern. An atlas of V1 location was constructed by computing the probability of V1 inclusion for each cortical location in the template space. This probabilistic atlas of V1 exhibits low prediction error compared to previous V1 probabilistic atlases built in volumetric coordinates. The increased predictability observed under surface-based registration suggests that the location of V1 is more accurately predicted by the cortical folds than by the shape of the brain embedded in the volume of the skull. In addition, the high quality of this atlas provides direct evidence that surface-based intersubject registration methods are superior to volume-based methods at superimposing functional areas of cortex, and therefore are better

  14. Accurate Prediction of the Dynamical Changes within the Second PDZ Domain of PTP1e

    PubMed Central

    Cilia, Elisa; Vuister, Geerten W.; Lenaerts, Tom

    2012-01-01

    Experimental NMR relaxation studies have shown that peptide binding induces dynamical changes at the side-chain level throughout the second PDZ domain of PTP1e, identifying as such the collection of residues involved in long-range communication. Even though different computational approaches have identified subsets of residues that were qualitatively comparable, no quantitative analysis of the accuracy of these predictions was thus far determined. Here, we show that our information theoretical method produces quantitatively better results with respect to the experimental data than some of these earlier methods. Moreover, it provides a global network perspective on the effect experienced by the different residues involved in the process. We also show that these predictions are consistent within both the human and mouse variants of this domain. Together, these results improve the understanding of intra-protein communication and allostery in PDZ domains, underlining at the same time the necessity of producing similar data sets for further validation of thses kinds of methods. PMID:23209399

  15. Accurate protein crystallography at ultra-high resolution: Valence electron distribution in crambin

    PubMed Central

    Jelsch, Christian; Teeter, Martha M.; Lamzin, Victor; Pichon-Pesme, Virginie; Blessing, Robert H.; Lecomte, Claude

    2000-01-01

    The charge density distribution of a protein has been refined experimentally. Diffraction data for a crambin crystal were measured to ultra-high resolution (0.54 Å) at low temperature by using short-wavelength synchrotron radiation. The crystal structure was refined with a model for charged, nonspherical, multipolar atoms to accurately describe the molecular electron density distribution. The refined parameters agree within 25% with our transferable electron density library derived from accurate single crystal diffraction analyses of several amino acids and small peptides. The resulting electron density maps of redistributed valence electrons (deformation maps) compare quantitatively well with a high-level quantum mechanical calculation performed on a monopeptide. This study provides validation for experimentally derived parameters and a window into charge density analysis of biological macromolecules. PMID:10737790

  16. Accurate protein crystallography at ultra-high resolution: valence electron distribution in crambin.

    PubMed

    Jelsch, C; Teeter, M M; Lamzin, V; Pichon-Pesme, V; Blessing, R H; Lecomte, C

    2000-03-28

    The charge density distribution of a protein has been refined experimentally. Diffraction data for a crambin crystal were measured to ultra-high resolution (0.54 A) at low temperature by using short-wavelength synchrotron radiation. The crystal structure was refined with a model for charged, nonspherical, multipolar atoms to accurately describe the molecular electron density distribution. The refined parameters agree within 25% with our transferable electron density library derived from accurate single crystal diffraction analyses of several amino acids and small peptides. The resulting electron density maps of redistributed valence electrons (deformation maps) compare quantitatively well with a high-level quantum mechanical calculation performed on a monopeptide. This study provides validation for experimentally derived parameters and a window into charge density analysis of biological macromolecules.

  17. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

    PubMed

    Zhao, Huiying; Wang, Jihua; Zhou, Yaoqi; Yang, Yuedong

    2014-01-01

    As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

  18. Comparative modeling: the state of the art and protein drug target structure prediction.

    PubMed

    Liu, Tianyun; Tang, Grace W; Capriotti, Emidio

    2011-07-01

    The goal of computational protein structure prediction is to provide three-dimensional (3D) structures with resolution comparable to experimental results. Comparative modeling, which predicts the 3D structure of a protein based on its sequence similarity to homologous structures, is the most accurate computational method for structure prediction. In the last two decades, significant progress has been made on comparative modeling methods. Using the large number of protein structures deposited in the Protein Data Bank (~65,000), automatic prediction pipelines are generating a tremendous number of models (~1.9 million) for sequences whose structures have not been experimentally determined. Accurate models are suitable for a wide range of applications, such as prediction of protein binding sites, prediction of the effect of protein mutations, and structure-guided virtual screening. In particular, comparative modeling has enabled structure-based drug design against protein targets with unknown structures. In this review, we describe the theoretical basis of comparative modeling, the available automatic methods and databases, and the algorithms to evaluate the accuracy of predicted structures. Finally, we discuss relevant applications in the prediction of important drug target proteins, focusing on the G protein-coupled receptor (GPCR) and protein kinase families.

  19. Using support vector machine and evolutionary profiles to predict antifreeze protein sequences.

    PubMed

    Zhao, Xiaowei; Ma, Zhiqiang; Yin, Minghao

    2012-01-01

    Antifreeze proteins (AFPs) are ice-binding proteins. Accurate identification of new AFPs is important in understanding ice-protein interactions and creating novel ice-binding domains in other proteins. In this paper, an accurate method, called AFP_PSSM, has been developed for predicting antifreeze proteins using a support vector machine (SVM) and position specific scoring matrix (PSSM) profiles. This is the first study in which evolutionary information in the form of PSSM profiles has been successfully used for predicting antifreeze proteins. Tested by 10-fold cross validation and independent test, the accuracy of the proposed method reaches 82.67% for the training dataset and 93.01% for the testing dataset, respectively. These results indicate that our predictor is a useful tool for predicting antifreeze proteins. A web server (AFP_PSSM) that implements the proposed predictor is freely available.

  20. Multiple selection filters ensure accurate tail-anchored membrane protein targeting

    PubMed Central

    Rao, Meera; Okreglak, Voytek; Chio, Un Seng; Cho, Hyunju; Walter, Peter; Shan, Shu-ou

    2016-01-01

    Accurate protein localization is crucial to generate and maintain organization in all cells. Achieving accuracy is challenging, as the molecular signals that dictate a protein’s cellular destination are often promiscuous. A salient example is the targeting of an essential class of tail-anchored (TA) proteins, whose sole defining feature is a transmembrane domain near their C-terminus. Here we show that the Guided Entry of Tail-anchored protein (GET) pathway selects TA proteins destined to the endoplasmic reticulum (ER) utilizing distinct molecular steps, including differential binding by the co-chaperone Sgt2 and kinetic proofreading after ATP hydrolysis by the targeting factor Get3. Further, the different steps select for distinct physicochemical features of the TA substrate. The use of multiple selection filters may be general to protein biogenesis pathways that must distinguish correct and incorrect substrates based on minor differences. DOI: http://dx.doi.org/10.7554/eLife.21301.001 PMID:27925580

  1. Prediction of protein function from protein sequence and structure.

    PubMed

    Whisstock, James C; Lesk, Arthur M

    2003-08-01

    The sequence of a genome contains the plans of the possible life of an organism, but implementation of genetic information depends on the functions of the proteins and nucleic acids that it encodes. Many individual proteins of known sequence and structure present challenges to the understanding of their function. In particular, a number of genes responsible for diseases have been identified but their specific functions are unknown. Whole-genome sequencing projects are a major source of proteins of unknown function. Annotation of a genome involves assignment of functions to gene products, in most cases on the basis of amino-acid sequence alone. 3D structure can aid the assignment of function, motivating the challenge of structural genomics projects to make structural information available for novel uncharacterized proteins. Structure-based identification of homologues often succeeds where sequence-alone-based methods fail, because in many cases evolution retains the folding pattern long after sequence similarity becomes undetectable. Nevertheless, prediction of protein function from sequence and structure is a difficult problem, because homologous proteins often have different functions. Many methods of function prediction rely on identifying similarity in sequence and/or structure between a protein of unknown function and one or more well-understood proteins. Alternative methods include inferring conservation patterns in members of a functionally uncharacterized family for which many sequences and structures are known. However, these inferences are tenuous. Such methods provide reasonable guesses at function, but are far from foolproof. It is therefore fortunate that the development of whole-organism approaches and comparative genomics permits other approaches to function prediction when the data are available. These include the use of protein-protein interaction patterns, and correlations between occurrences of related proteins in different organisms, as

  2. Effects of protein conformation in docking: improved pose prediction through protein pocket adaptation

    NASA Astrophysics Data System (ADS)

    Jain, Ajay N.

    2009-06-01

    Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the "cognate" docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the "cross-docking case" (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate.

  3. Effects of protein conformation in docking: improved pose prediction through protein pocket adaptation.

    PubMed

    Jain, Ajay N

    2009-06-01

    Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the "cognate" docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the "cross-docking case" (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate.

  4. Raoult’s law revisited: accurately predicting equilibrium relative humidity points for humidity control experiments

    PubMed Central

    Bowler, Michael G.

    2017-01-01

    The humidity surrounding a sample is an important variable in scientific experiments. Biological samples in particular require not just a humid atmosphere but often a relative humidity (RH) that is in equilibrium with a stabilizing solution required to maintain the sample in the same state during measurements. The controlled dehydration of macromolecular crystals can lead to significant increases in crystal order, leading to higher diffraction quality. Devices that can accurately control the humidity surrounding crystals while monitoring diffraction have led to this technique being increasingly adopted, as the experiments become easier and more reproducible. Matching the RH to the mother liquor is the first step in allowing the stable mounting of a crystal. In previous work [Wheeler, Russi, Bowler & Bowler (2012). Acta Cryst. F68, 111–114], the equilibrium RHs were measured for a range of concentrations of the most commonly used precipitants in macromolecular crystallography and it was shown how these related to Raoult’s law for the equilibrium vapour pressure of water above a solution. However, a discrepancy between the measured values and those predicted by theory could not be explained. Here, a more precise humidity control device has been used to determine equilibrium RH points. The new results are in agreement with Raoult’s law. A simple argument in statistical mechanics is also presented, demonstrating that the equilibrium vapour pressure of a solvent is proportional to its mole fraction in an ideal solution: Raoult’s law. The same argument can be extended to the case where the solvent and solute molecules are of different sizes, as is the case with polymers. The results provide a framework for the correct maintenance of the RH surrounding a sample. PMID:28381983

  5. Raoult's law revisited: accurately predicting equilibrium relative humidity points for humidity control experiments.

    PubMed

    Bowler, Michael G; Bowler, David R; Bowler, Matthew W

    2017-04-01

    The humidity surrounding a sample is an important variable in scientific experiments. Biological samples in particular require not just a humid atmosphere but often a relative humidity (RH) that is in equilibrium with a stabilizing solution required to maintain the sample in the same state during measurements. The controlled dehydration of macromolecular crystals can lead to significant increases in crystal order, leading to higher diffraction quality. Devices that can accurately control the humidity surrounding crystals while monitoring diffraction have led to this technique being increasingly adopted, as the experiments become easier and more reproducible. Matching the RH to the mother liquor is the first step in allowing the stable mounting of a crystal. In previous work [Wheeler, Russi, Bowler & Bowler (2012). Acta Cryst. F68, 111-114], the equilibrium RHs were measured for a range of concentrations of the most commonly used precipitants in macromolecular crystallography and it was shown how these related to Raoult's law for the equilibrium vapour pressure of water above a solution. However, a discrepancy between the measured values and those predicted by theory could not be explained. Here, a more precise humidity control device has been used to determine equilibrium RH points. The new results are in agreement with Raoult's law. A simple argument in statistical mechanics is also presented, demonstrating that the equilibrium vapour pressure of a solvent is proportional to its mole fraction in an ideal solution: Raoult's law. The same argument can be extended to the case where the solvent and solute molecules are of different sizes, as is the case with polymers. The results provide a framework for the correct maintenance of the RH surrounding a sample.

  6. Predicting β-Turns in Protein Using Kernel Logistic Regression

    PubMed Central

    Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min

    2013-01-01

    A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Qtotal of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793

  7. High IFIT1 expression predicts improved clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma.

    PubMed

    Zhang, Jin-Feng; Chen, Yao; Lin, Guo-Shi; Zhang, Jian-Dong; Tang, Wen-Long; Huang, Jian-Huang; Chen, Jin-Shou; Wang, Xing-Fu; Lin, Zhi-Xiong

    2016-06-01

    Interferon-induced protein with tetratricopeptide repeat 1 (IFIT1) plays a key role in growth suppression and apoptosis promotion in cancer cells. Interferon was reported to induce the expression of IFIT1 and inhibit the expression of O-6-methylguanine-DNA methyltransferase (MGMT).This study aimed to investigate the expression of IFIT1, the correlation between IFIT1 and MGMT, and their impact on the clinical outcome in newly diagnosed glioblastoma. The expression of IFIT1 and MGMT and their correlation were investigated in the tumor tissues from 70 patients with newly diagnosed glioblastoma. The effects on progression-free survival and overall survival were evaluated. Of 70 cases, 57 (81.4%) tissue samples showed high expression of IFIT1 by immunostaining. The χ(2) test indicated that the expression of IFIT1 and MGMT was negatively correlated (r = -0.288, P = .016). Univariate and multivariate analyses confirmed high IFIT1 expression as a favorable prognostic indicator for progression-free survival (P = .005 and .017) and overall survival (P = .001 and .001), respectively. Patients with 2 favorable factors (high IFIT1 and low MGMT) had an improved prognosis as compared with others. The results demonstrated significantly increased expression of IFIT1 in newly diagnosed glioblastoma tissue. The negative correlation between IFIT1 and MGMT expression may be triggered by interferon. High IFIT1 can be a predictive biomarker of favorable clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma.

  8. Predicting protein-protein relationships from literature using latent topics.

    PubMed

    Aso, Tatsuya; Eguchi, Koji

    2009-10-01

    This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.

  9. Prediction of protein-protein interactions: unifying evolution and structure at protein interfaces.

    PubMed

    Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

    2011-06-01

    The vast majority of the chores in the living cell involve protein-protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein-protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations.

  10. Predicting Resistance Mutations Using Protein Design Algorithms

    SciTech Connect

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  11. Protein function prediction using domain families

    PubMed Central

    2013-01-01

    Here we assessed the use of domain families for predicting the functions of whole proteins. These 'functional families' (FunFams) were derived using a protocol that combines sequence clustering with supervised cluster evaluation, relying on available high-quality Gene Ontology (GO) annotation data in the latter step. In essence, the protocol groups domain sequences belonging to the same superfamily into families based on the GO annotations of their parent proteins. An initial test based on enzyme sequences confirmed that the FunFams resemble enzyme (domain) families much better than do families produced by sequence clustering alone. For the CAFA 2011 experiment, we further associated the FunFams with GO terms probabilistically. All target proteins were first submitted to domain superfamily assignment, followed by FunFam assignment and, eventually, function assignment. The latter included an integration step for multi-domain target proteins. The CAFA results put our domain-based approach among the top ten of 31 competing groups and 56 prediction methods, confirming that it outperforms simple pairwise whole-protein sequence comparisons. PMID:23514456

  12. Predictive and comparative analysis of Ebolavirus proteins

    PubMed Central

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395

  13. CASP11--An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline.

    PubMed

    Fischer, Axel W; Heinze, Sten; Putnam, Daniel K; Li, Bian; Pino, James C; Xia, Yan; Lopez, Carlos F; Meiler, Jens

    2016-01-01

    In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data.

  14. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.

    PubMed

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.

  15. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra

    PubMed Central

    Xiao, Kaijie; Yu, Fan; Fang, Houqin; Xue, Bingbing; Liu, Yan; Tian, Zhixin

    2015-01-01

    It has long been an analytical challenge to accurately and efficiently resolve extremely dense overlapping isotopic envelopes (OIEs) in protein tandem mass spectra to confidently identify proteins. Here, we report a computationally efficient method, called OIE_CARE, to resolve OIEs by calculating the relative deviation between the ideal and observed experimental abundance. In the OIE_CARE method, the ideal experimental abundance of a particular overlapping isotopic peak (OIP) is first calculated for all the OIEs sharing this OIP. The relative deviation (RD) of the overall observed experimental abundance of this OIP relative to the summed ideal value is then calculated. The final individual abundance of the OIP for each OIE is the individual ideal experimental abundance multiplied by 1 + RD. Initial studies were performed using higher-energy collisional dissociation tandem mass spectra on myoglobin (with direct infusion) and the intact E. coli proteome (with liquid chromatographic separation). Comprehensive data at the protein and proteome levels, high confidence and good reproducibility were achieved. The resolving method reported here can, in principle, be extended to resolve any envelope-type overlapping data for which the corresponding theoretical reference values are available. PMID:26439836

  16. FAMBE-pH: a fast and accurate method to compute the total solvation free energies of proteins.

    PubMed

    Vorobjev, Yury N; Vila, Jorge A; Scheraga, Harold A

    2008-09-04

    A fast and accurate method to compute the total solvation free energies of proteins as a function of pH is presented. The method makes use of a combination of approaches, some of which have already appeared in the literature; (i) the Poisson equation is solved with an optimized fast adaptive multigrid boundary element (FAMBE) method; (ii) the electrostatic free energies of the ionizable sites are calculated for their neutral and charged states by using a detailed model of atomic charges; (iii) a set of optimal atomic radii is used to define a precise dielectric surface interface; (iv) a multilevel adaptive tessellation of this dielectric surface interface is achieved by using multisized boundary elements; and (v) 1:1 salt effects are included. The equilibrium proton binding/release is calculated with the Tanford-Schellman integral if the proteins contain more than approximately 20-25 ionizable groups; for a smaller number of ionizable groups, the ionization partition function is calculated directly. The FAMBE method is tested as a function of pH (FAMBE-pH) with three proteins, namely, bovine pancreatic trypsin inhibitor (BPTI), hen egg white lysozyme (HEWL), and bovine pancreatic ribonuclease A (RNaseA). The results are (a) the FAMBE-pH method reproduces the observed pK a's of the ionizable groups of these proteins within an average absolute value of 0.4 p K units and a maximum error of 1.2 p K units and (b) comparison of the calculated total pH-dependent solvation free energy for BPTI, between the exact calculation of the ionization partition function and the Tanford-Schellman integral method, shows agreement within 1.2 kcal/mol. These results indicate that calculation of total solvation free energies with the FAMBE-pH method can provide an accurate prediction of protein conformational stability at a given fixed pH and, if coupled with molecular mechanics or molecular dynamics methods, can also be used for more realistic studies of protein folding, unfolding, and

  17. Mitotic Protein CSPP1 Interacts with CENP-H Protein to Coordinate Accurate Chromosome Oscillation in Mitosis*

    PubMed Central

    Zhu, Lijuan; Wang, Zhikai; Wang, Wenwen; Wang, Chunli; Hua, Shasha; Su, Zeqi; Brako, Larry; Garcia-Barrio, Minerva; Ye, Mingliang; Wei, Xuan; Zou, Hanfa; Ding, Xia; Liu, Lifang; Liu, Xing; Yao, Xuebiao

    2015-01-01

    Mitotic chromosome segregation is orchestrated by the dynamic interaction of spindle microtubules with the kinetochores. During chromosome alignment, kinetochore-bound microtubules undergo dynamic cycles between growth and shrinkage, leading to an oscillatory movement of chromosomes along the spindle axis. Although kinetochore protein CENP-H serves as a molecular control of kinetochore-microtubule dynamics, the mechanistic link between CENP-H and kinetochore microtubules (kMT) has remained less characterized. Here, we show that CSPP1 is a kinetochore protein essential for accurate chromosome movements in mitosis. CSPP1 binds to CENP-H in vitro and in vivo. Suppression of CSPP1 perturbs proper mitotic progression and compromises the satisfaction of spindle assembly checkpoint. In addition, chromosome oscillation is greatly attenuated in CSPP1-depleted cells, similar to what was observed in the CENP-H-depleted cells. Importantly, CSPP1 depletion enhances velocity of kinetochore movement, and overexpression of CSPP1 decreases the speed, suggesting that CSPP1 promotes kMT stability during cell division. Specific perturbation of CENP-H/CSPP1 interaction using a membrane-permeable competing peptide resulted in a transient mitotic arrest and chromosome segregation defect. Based on these findings, we propose that CSPP1 cooperates with CENP-H on kinetochores to serve as a novel regulator of kMT dynamics for accurate chromosome segregation. PMID:26378239

  18. Mitotic Protein CSPP1 Interacts with CENP-H Protein to Coordinate Accurate Chromosome Oscillation in Mitosis.

    PubMed

    Zhu, Lijuan; Wang, Zhikai; Wang, Wenwen; Wang, Chunli; Hua, Shasha; Su, Zeqi; Brako, Larry; Garcia-Barrio, Minerva; Ye, Mingliang; Wei, Xuan; Zou, Hanfa; Ding, Xia; Liu, Lifang; Liu, Xing; Yao, Xuebiao

    2015-11-06

    Mitotic chromosome segregation is orchestrated by the dynamic interaction of spindle microtubules with the kinetochores. During chromosome alignment, kinetochore-bound microtubules undergo dynamic cycles between growth and shrinkage, leading to an oscillatory movement of chromosomes along the spindle axis. Although kinetochore protein CENP-H serves as a molecular control of kinetochore-microtubule dynamics, the mechanistic link between CENP-H and kinetochore microtubules (kMT) has remained less characterized. Here, we show that CSPP1 is a kinetochore protein essential for accurate chromosome movements in mitosis. CSPP1 binds to CENP-H in vitro and in vivo. Suppression of CSPP1 perturbs proper mitotic progression and compromises the satisfaction of spindle assembly checkpoint. In addition, chromosome oscillation is greatly attenuated in CSPP1-depleted cells, similar to what was observed in the CENP-H-depleted cells. Importantly, CSPP1 depletion enhances velocity of kinetochore movement, and overexpression of CSPP1 decreases the speed, suggesting that CSPP1 promotes kMT stability during cell division. Specific perturbation of CENP-H/CSPP1 interaction using a membrane-permeable competing peptide resulted in a transient mitotic arrest and chromosome segregation defect. Based on these findings, we propose that CSPP1 cooperates with CENP-H on kinetochores to serve as a novel regulator of kMT dynamics for accurate chromosome segregation.

  19. Automated selected reaction monitoring software for accurate label-free protein quantification.

    PubMed

    Teleman, Johan; Karlsson, Christofer; Waldemarson, Sofia; Hansson, Karin; James, Peter; Malmström, Johan; Levander, Fredrik

    2012-07-06

    Selected reaction monitoring (SRM) is a mass spectrometry method with documented ability to quantify proteins accurately and reproducibly using labeled reference peptides. However, the use of labeled reference peptides becomes impractical if large numbers of peptides are targeted and when high flexibility is desired when selecting peptides. We have developed a label-free quantitative SRM workflow that relies on a new automated algorithm, Anubis, for accurate peak detection. Anubis efficiently removes interfering signals from contaminating peptides to estimate the true signal of the targeted peptides. We evaluated the algorithm on a published multisite data set and achieved results in line with manual data analysis. In complex peptide mixtures from whole proteome digests of Streptococcus pyogenes we achieved a technical variability across the entire proteome abundance range of 6.5-19.2%, which was considerably below the total variation across biological samples. Our results show that the label-free SRM workflow with automated data analysis is feasible for large-scale biological studies, opening up new possibilities for quantitative proteomics and systems biology.

  20. PQuad: Visualization of Predicted Peptides and Proteins

    SciTech Connect

    Havre, Susan L.; Singhal, Mudita; Payne, Deborah A.; Webb-Robertson, Bobbie-Jo M.

    2004-10-10

    New high-throughput proteomic techniques generate data faster than biologist and bioinformaticists can analyze it. Yet, hidden within this massive and complex data are answers to basic questions about how cells function to support life or respond to disease. Now biologists can take a global or systems approach studying not one or two proteins at a time but whole proteomes comprising all the proteins in a cell. However, the tremendous size and complexity of the high-throughput experiment data make it difficult to process and interpret. Visualization provides powerful analysis capabilities for such enormous and complex data. In this paper, we introduce a novel interactive visualization, PQuad (Peptide Permutation and Protein Prediction), designed for the visual analysis of peptides (protein fragments) identified from high-throughput data. PQuad depicts the experiment peptides in the context of their parent protein and DNA, thereby integrating proteomic and genomic information. A wrapped line metaphor is applied across key resolutions of the data, from a compressed view of an entire chromosome to the actual nucleotide sequence. PQuad provides a difference visualization for comparing peptides from different experimental conditions. We describe the requirements for such a visual analysis tool, the design decisions, and the novel aspects of PQuad.

  1. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

    PubMed Central

    2014-01-01

    Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894

  2. Towards more accurate wind and solar power prediction by improving NWP model physics

    NASA Astrophysics Data System (ADS)

    Steiner, Andrea; Köhler, Carmen; von Schumann, Jonas; Ritter, Bodo

    2014-05-01

    The growing importance and successive expansion of renewable energies raise new challenges for decision makers, economists, transmission system operators, scientists and many more. In this interdisciplinary field, the role of Numerical Weather Prediction (NWP) is to reduce the errors and provide an a priori estimate of remaining uncertainties associated with the large share of weather-dependent power sources. For this purpose it is essential to optimize NWP model forecasts with respect to those prognostic variables which are relevant for wind and solar power plants. An improved weather forecast serves as the basis for a sophisticated power forecasts. Consequently, a well-timed energy trading on the stock market, and electrical grid stability can be maintained. The German Weather Service (DWD) currently is involved with two projects concerning research in the field of renewable energy, namely ORKA*) and EWeLiNE**). Whereas the latter is in collaboration with the Fraunhofer Institute (IWES), the project ORKA is led by energy & meteo systems (emsys). Both cooperate with German transmission system operators. The goal of the projects is to improve wind and photovoltaic (PV) power forecasts by combining optimized NWP and enhanced power forecast models. In this context, the German Weather Service aims to improve its model system, including the ensemble forecasting system, by working on data assimilation, model physics and statistical post processing. This presentation is focused on the identification of critical weather situations and the associated errors in the German regional NWP model COSMO-DE. First steps leading to improved physical parameterization schemes within the NWP-model are presented. Wind mast measurements reaching up to 200 m height above ground are used for the estimation of the (NWP) wind forecast error at heights relevant for wind energy plants. One particular problem is the daily cycle in wind speed. The transition from stable stratification during

  3. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  4. Huntingtin-interacting protein 1-related is required for accurate congression and segregation of chromosomes.

    PubMed

    Park, Sun Joo

    2010-12-01

    Huntingtin-interacting protein 1-related (HIP1r) is known to function in clathrin-mediated endocytosis and regulation of the actin cytoskeleton, which occurs continuously in non-dividing cells. This study reports a new function for HIP1r in mitosis. Green fluorescent protein-fused HIP1r localizes to the mitotic spindles. Depletion of HIP1r by RNA interference induces misalignment of chromosomes and prolonged mitosis, which is associated with decreased proliferation of HIP1r-deficeint cells. Chromosome misalignment leads to missegregation and ultimately production of multinucleated cells. Depletion of HIP1r causes persistent activation of the spindle checkpoint in misaligned chromosomes. These findings suggest that HIP1r plays an important role in regulating the attachment of spindle microtubules to chromosomes during mitosis, an event that is required for accurate congression and segregation of chromosomes. This finding may provide new insights that improve the understanding of various human diseases involving HIP1r as well as its fusion genes.

  5. Conformational energy range of ligands in protein crystal structures: The difficult quest for accurate understanding.

    PubMed

    Peach, Megan L; Cachau, Raul E; Nicklaus, Marc C

    2017-02-24

    In this review, we address a fundamental question: What is the range of conformational energies seen in ligands in protein-ligand crystal structures? This value is important biophysically, for better understanding the protein-ligand binding process; and practically, for providing a parameter to be used in many computational drug design methods such as docking and pharmacophore searches. We synthesize a selection of previously reported conflicting results from computational studies of this issue and conclude that high ligand conformational energies really are present in some crystal structures. The main source of disagreement between different analyses appears to be due to divergent treatments of electrostatics and solvation. At the same time, however, for many ligands, a high conformational energy is in error, due to either crystal structure inaccuracies or incorrect determination of the reference state. Aside from simple chemistry mistakes, we argue that crystal structure error may mainly be because of the heuristic weighting of ligand stereochemical restraints relative to the fit of the structure to the electron density. This problem cannot be fixed with improvements to electron density fitting or with simple ligand geometry checks, though better metrics are needed for evaluating ligand and binding site chemistry in addition to geometry during structure refinement. The ultimate solution for accurately determining ligand conformational energies lies in ultrahigh-resolution crystal structures that can be refined without restraints.

  6. A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

    NASA Astrophysics Data System (ADS)

    Carlson, Joel N. K.; Park, Jong Min; Park, So-Yeon; In Park, Jong; Choi, Yunseok; Ye, Sung-Joon

    2016-03-01

    Discrepancies between planned and delivered movements of multi-leaf collimators (MLCs) are an important source of errors in dose distributions during radiotherapy. In this work we used machine learning techniques to train models to predict these discrepancies, assessed the accuracy of the model predictions, and examined the impact these errors have on quality assurance (QA) procedures and dosimetry. Predictive leaf motion parameters for the models were calculated from the plan files, such as leaf position and velocity, whether the leaf was moving towards or away from the isocenter of the MLC, and many others. Differences in positions between synchronized DICOM-RT planning files and DynaLog files reported during QA delivery were used as a target response for training of the models. The final model is capable of predicting MLC positions during delivery to a high degree of accuracy. For moving MLC leaves, predicted positions were shown to be significantly closer to delivered positions than were planned positions. By incorporating predicted positions into dose calculations in the TPS, increases were shown in gamma passing rates against measured dose distributions recorded during QA delivery. For instance, head and neck plans with 1%/2 mm gamma criteria had an average increase in passing rate of 4.17% (SD  =  1.54%). This indicates that the inclusion of predictions during dose calculation leads to a more realistic representation of plan delivery. To assess impact on the patient, dose volumetric histograms (DVH) using delivered positions were calculated for comparison with planned and predicted DVHs. In all cases, predicted dose volumetric parameters were in closer agreement to the delivered parameters than were the planned parameters, particularly for organs at risk on the periphery of the treatment area. By incorporating the predicted positions into the TPS, the treatment planner is given a more realistic view of the dose distribution as it will truly be

  7. An accurate and efficient method to predict the electronic excitation energies of BODIPY fluorescent dyes.

    PubMed

    Wang, Jia-Nan; Jin, Jun-Ling; Geng, Yun; Sun, Shi-Ling; Xu, Hong-Liang; Lu, Ying-Hua; Su, Zhong-Min

    2013-03-15

    Recently, the extreme learning machine neural network (ELMNN) as a valid computing method has been proposed to predict the nonlinear optical property successfully (Wang et al., J. Comput. Chem. 2012, 33, 231). In this work, first, we follow this line of work to predict the electronic excitation energies using the ELMNN method. Significantly, the root mean square deviation of the predicted electronic excitation energies of 90 4,4-difluoro-4-bora-3a,4a-diaza-s-indacene (BODIPY) derivatives between the predicted and experimental values has been reduced to 0.13 eV. Second, four groups of molecule descriptors are considered when building the computing models. The results show that the quantum chemical descriptions have the closest intrinsic relation with the electronic excitation energy values. Finally, a user-friendly web server (EEEBPre: Prediction of electronic excitation energies for BODIPY dyes), which is freely accessible to public at the web site: http://202.198.129.218, has been built for prediction. This web server can return the predicted electronic excitation energy values of BODIPY dyes that are high consistent with the experimental values. We hope that this web server would be helpful to theoretical and experimental chemists in related research.

  8. Sensor data fusion for accurate cloud presence prediction using Dempster-Shafer evidence theory.

    PubMed

    Li, Jiaming; Luo, Suhuai; Jin, Jesse S

    2010-01-01

    Sensor data fusion technology can be used to best extract useful information from multiple sensor observations. It has been widely applied in various applications such as target tracking, surveillance, robot navigation, signal and image processing. This paper introduces a novel data fusion approach in a multiple radiation sensor environment using Dempster-Shafer evidence theory. The methodology is used to predict cloud presence based on the inputs of radiation sensors. Different radiation data have been used for the cloud prediction. The potential application areas of the algorithm include renewable power for virtual power station where the prediction of cloud presence is the most challenging issue for its photovoltaic output. The algorithm is validated by comparing the predicted cloud presence with the corresponding sunshine occurrence data that were recorded as the benchmark. Our experiments have indicated that comparing to the approaches using individual sensors, the proposed data fusion approach can increase correct rate of cloud prediction by ten percent, and decrease unknown rate of cloud prediction by twenty three percent.

  9. Protein design algorithms predict viable resistance to an experimental antifolate.

    PubMed

    Reeve, Stephanie M; Gainza, Pablo; Frey, Kathleen M; Georgiev, Ivelin; Donald, Bruce R; Anderson, Amy C

    2015-01-20

    Methods to accurately predict potential drug target mutations in response to early-stage leads could drive the design of more resilient first generation drug candidates. In this study, a structure-based protein design algorithm (K* in the OSPREY suite) was used to prospectively identify single-nucleotide polymorphisms that confer resistance to an experimental inhibitor effective against dihydrofolate reductase (DHFR) from Staphylococcus aureus. Four of the top-ranked mutations in DHFR were found to be catalytically competent and resistant to the inhibitor. Selection of resistant bacteria in vitro reveals that two of the predicted mutations arise in the background of a compensatory mutation. Using enzyme kinetics, microbiology, and crystal structures of the complexes, we determined the fitness of the mutant enzymes and strains, the structural basis of resistance, and the compensatory relationship of the mutations. To our knowledge, this work illustrates the first application of protein design algorithms to prospectively predict viable resistance mutations that arise in bacteria under antibiotic pressure.

  10. Prediction of Peptide and Protein Propensity for Amyloid Formation

    PubMed Central

    Família, Carlos; Dennison, Sarah R.; Quintas, Alexandre; Phoenix, David A.

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation. PMID:26241652

  11. Prediction of Peptide and Protein Propensity for Amyloid Formation.

    PubMed

    Família, Carlos; Dennison, Sarah R; Quintas, Alexandre; Phoenix, David A

    2015-01-01

    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation.

  12. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function.

  13. Prediction of zinc finger DNA binding protein.

    PubMed

    Nakata, K

    1995-04-01

    Using the neural network algorithm with back-propagation training procedure, we analysed the zinc finger DNA binding protein sequences. We incorporated the characteristic patterns around the zinc finger motifs TFIIIA type (Cys-X2-5-Cys-X12-13-His-X2-5-His) and the steroid hormone receptor type (Cys-X2-5-Cys-X12-15-Cys-X2-5-Cys-X15-16-Cys-X4-5-Cys-X8-10- Cys-X2-3-Cys) in the neural network algorithm. The patterns used in the neural network were the amino acid pattern, the electric charge and polarity pattern, the side-chain chemical property and subproperty patterns, the hydrophobicity and hydrophilicity patterns and the secondary structure propensity pattern. Two consecutive patterns were also considered. Each pattern was incorporated in the single layer perceptron algorithm and the combinations of patterns were considered in the two-layer perceptron algorithm. As for the TFIIIA type zinc finger DNA binding motifs, the prediction results of the two-layer perceptron algorithm reached up to 96.9% discrimination, and the prediction results of the discriminant analysis using the combination of several characters reached up to 97.0%. As for the steroid hormone receptor type zinc finger, the prediction results of neural network algorithm and the discriminant analyses reached up to 96.0%.

  14. Empirical approaches to more accurately predict benthic-pelagic coupling in biogeochemical ocean models

    NASA Astrophysics Data System (ADS)

    Dale, Andy; Stolpovsky, Konstantin; Wallmann, Klaus

    2016-04-01

    The recycling and burial of biogenic material in the sea floor plays a key role in the regulation of ocean chemistry. Proper consideration of these processes in ocean biogeochemical models is becoming increasingly recognized as an important step in model validation and prediction. However, the rate of organic matter remineralization in sediments and the benthic flux of redox-sensitive elements are difficult to predict a priori. In this communication, examples of empirical benthic flux models that can be coupled to earth system models to predict sediment-water exchange in the open ocean are presented. Large uncertainties hindering further progress in this field include knowledge of the reactivity of organic carbon reaching the sediment, the importance of episodic variability in bottom water chemistry and particle rain rates (for both the deep-sea and margins) and the role of benthic fauna. How do we meet the challenge?

  15. An endometrial gene expression signature accurately predicts recurrent implantation failure after IVF

    PubMed Central

    Koot, Yvonne E. M.; van Hooff, Sander R.; Boomsma, Carolien M.; van Leenen, Dik; Groot Koerkamp, Marian J. A.; Goddijn, Mariëtte; Eijkemans, Marinus J. C.; Fauser, Bart C. J. M.; Holstege, Frank C. P.; Macklon, Nick S.

    2016-01-01

    The primary limiting factor for effective IVF treatment is successful embryo implantation. Recurrent implantation failure (RIF) is a condition whereby couples fail to achieve pregnancy despite consecutive embryo transfers. Here we describe the collection of gene expression profiles from mid-luteal phase endometrial biopsies (n = 115) from women experiencing RIF and healthy controls. Using a signature discovery set (n = 81) we identify a signature containing 303 genes predictive of RIF. Independent validation in 34 samples shows that the gene signature predicts RIF with 100% positive predictive value (PPV). The strength of the RIF associated expression signature also stratifies RIF patients into distinct groups with different subsequent implantation success rates. Exploration of the expression changes suggests that RIF is primarily associated with reduced cellular proliferation. The gene signature will be of value in counselling and guiding further treatment of women who fail to conceive upon IVF and suggests new avenues for developing intervention. PMID:26797113

  16. Dynamics of Flexible MLI-type Debris for Accurate Orbit Prediction

    DTIC Science & Technology

    2014-09-01

    SUBJECT TERMS EOARD, orbital debris , HAMR objects, multi-layered insulation, orbital dynamics, orbit predictions, orbital propagation 16. SECURITY...illustration are orbital debris [Souce: NASA...piece of space junk (a paint fleck) during the STS-7 mission (Photo: NASA Orbital Debris Program Office

  17. Hippocampus neuronal metabolic gene expression outperforms whole tissue data in accurately predicting Alzheimer's disease progression.

    PubMed

    Stempler, Shiri; Waldman, Yedael Y; Wolf, Lior; Ruppin, Eytan

    2012-09-01

    Numerous metabolic alterations are associated with the impairment of brain cells in Alzheimer's disease (AD). Here we use gene expression microarrays of both whole hippocampus tissue and hippocampal neurons of AD patients to investigate the ability of metabolic gene expression to predict AD progression and its cognitive decline. We find that the prediction accuracy of different AD stages is markedly higher when using neuronal expression data (0.9) than when using whole tissue expression (0.76). Furthermore, the metabolic genes' expression is shown to be as effective in predicting AD severity as the entire gene list. Remarkably, a regression model from hippocampal metabolic gene expression leads to a marked correlation of 0.57 with the Mini-Mental State Examination cognitive score. Notably, the expression of top predictive neuronal genes in AD is significantly higher than that of other metabolic genes in the brains of healthy subjects. All together, the analyses point to a subset of metabolic genes that is strongly associated with normal brain functioning and whose disruption plays a major role in AD.

  18. Predicting repeat self-harm in children--how accurate can we expect to be?

    PubMed

    Chitsabesan, Prathiba; Harrington, Richard; Harrington, Valerie; Tomenson, Barbara

    2003-01-01

    The main objective of the study was to find which variables predict repetition of deliberate self-harm in children. The study is based on a group of children who took part in a randomized control trial investigating the effects of a home-based family intervention for children who had deliberately poisoned themselves. These children had a range of baseline and outcome measures collected on two occasions (two and six months follow-up). Outcome data were collected from 149 (92 %) of the initial 162 children over the six months. Twenty-three children made a further deliberate self-harm attempt within the follow-up period. A number of variables at baseline were found to be significantly associated with repeat self-harm. Parental mental health and a history of previous attempts were the strongest predictors. A model of prediction of further deliberate self-harm combining these significant individual variables produced a high positive predictive value (86 %) but had low sensitivity (28 %). Predicting repeat self-harm in children is difficult, even with a comprehensive series of assessments over multiple time points, and we need to adapt services with this in mind. We propose a model of service provision which takes these findings into account.

  19. TOPPER: topology prediction of transmembrane protein based on evidential reasoning.

    PubMed

    Deng, Xinyang; Liu, Qi; Hu, Yong; Deng, Yong

    2013-01-01

    The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster's rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method.

  20. Accurate prediction of the optical rotation and NMR properties for highly flexible chiral natural products.

    PubMed

    Hashmi, Muhammad Ali; Andreassend, Sarah K; Keyzers, Robert A; Lein, Matthias

    2016-09-21

    Despite advances in electronic structure theory the theoretical prediction of spectroscopic properties remains a computational challenge. This is especially true for natural products that exhibit very large conformational freedom and hence need to be sampled over many different accessible conformations. We report a strategy, which is able to predict NMR chemical shifts and more elusive properties like the optical rotation with great precision, through step-wise incremental increases of the conformational degrees of freedom. The application of this method is demonstrated for 3-epi-xestoaminol C, a chiral natural compound with a long, linear alkyl chain of 14 carbon atoms. Experimental NMR and [α]D values are reported to validate the results of the density functional theory calculations.

  1. Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound–Kinase Activities: A Way toward Selective Promiscuity by Design?

    PubMed Central

    2016-01-01

    Drug discovery programs frequently target members of the human kinome and try to identify small molecule protein kinase inhibitors, primarily for cancer treatment, additional indications being increasingly investigated. One of the challenges is controlling the inhibitors degree of selectivity, assessed by in vitro profiling against panels of protein kinases. We manually extracted, compiled, and standardized such profiles published in the literature: we collected 356 908 data points corresponding to 482 protein kinases, 2106 inhibitors, and 661 patents. We then analyzed this data set in terms of kinome coverage, results reproducibility, popularity, and degree of selectivity of both kinases and inhibitors. We used the data set to create robust proteochemometric models capable of predicting kinase activity (the ligand–target space was modeled with an externally validated RMSE of 0.41 ± 0.02 log units and R02 0.74 ± 0.03), in order to account for missing or unreliable measurements. The influence on the prediction quality of parameters such as number of measurements, Murcko scaffold frequency or inhibitor type was assessed. Interpretation of the models enabled to highlight inhibitors and kinases properties correlated with higher affinities, and an analysis in the context of kinases crystal structures was performed. Overall, the models quality allows the accurate prediction of kinase-inhibitor activities and their structural interpretation, thus paving the way for the rational design of compounds with a targeted selectivity profile. PMID:27482722

  2. Robust and Accurate Modeling Approaches for Migraine Per-Patient Prediction from Ambulatory Data.

    PubMed

    Pagán, Josué; De Orbe, M Irene; Gago, Ana; Sobrado, Mónica; Risco-Martín, José L; Mora, J Vivancos; Moya, José M; Ayala, José L

    2015-06-30

    Migraine is one of the most wide-spread neurological disorders, and its medical treatment represents a high percentage of the costs of health systems. In some patients, characteristic symptoms that precede the headache appear. However, they are nonspecific, and their prediction horizon is unknown and pretty variable; hence, these symptoms are almost useless for prediction, and they are not useful to advance the intake of drugs to be effective and neutralize the pain. To solve this problem, this paper sets up a realistic monitoring scenario where hemodynamic variables from real patients are monitored in ambulatory conditions with a wireless body sensor network (WBSN). The acquired data are used to evaluate the predictive capabilities and robustness against noise and failures in sensors of several modeling approaches. The obtained results encourage the development of per-patient models based on state-space models (N4SID) that are capable of providing average forecast windows of 47 min and a low rate of false positives.

  3. Accurate structure prediction of peptide–MHC complexes for identifying highly immunogenic antigens

    SciTech Connect

    Park, Min-Sun; Park, Sung Yong; Miller, Keith R.; Collins, Edward J.; Lee, Ha Youn

    2013-11-01

    Designing an optimal HIV-1 vaccine faces the challenge of identifying antigens that induce a broad immune capacity. One factor to control the breadth of T cell responses is the surface morphology of a peptide–MHC complex. Here, we present an in silico protocol for predicting peptide–MHC structure. A robust signature of a conformational transition was identified during all-atom molecular dynamics, which results in a model with high accuracy. A large test set was used in constructing our protocol and we went another step further using a blind test with a wild-type peptide and two highly immunogenic mutants, which predicted substantial conformational changes in both mutants. The center residues at position five of the analogs were configured to be accessible to solvent, forming a prominent surface, while the residue of the wild-type peptide was to point laterally toward the side of the binding cleft. We then experimentally determined the structures of the blind test set, using high resolution of X-ray crystallography, which verified predicted conformational changes. Our observation strongly supports a positive association of the surface morphology of a peptide–MHC complex to its immunogenicity. Our study offers the prospect of enhancing immunogenicity of vaccines by identifying MHC binding immunogens.

  4. Fast and accurate numerical method for predicting gas chromatography retention time.

    PubMed

    Claumann, Carlos Alberto; Wüst Zibetti, André; Bolzan, Ariovaldo; Machado, Ricardo A F; Pinto, Leonel Teixeira

    2015-08-07

    Predictive modeling for gas chromatography compound retention depends on the retention factor (ki) and on the flow of the mobile phase. Thus, different approaches for determining an analyte ki in column chromatography have been developed. The main one is based on the thermodynamic properties of the component and on the characteristics of the stationary phase. These models can be used to estimate the parameters and to optimize the programming of temperatures, in gas chromatography, for the separation of compounds. Different authors have proposed the use of numerical methods for solving these models, but these methods demand greater computational time. Hence, a new method for solving the predictive modeling of analyte retention time is presented. This algorithm is an alternative to traditional methods because it transforms its attainments into root determination problems within defined intervals. The proposed approach allows for tr calculation, with accuracy determined by the user of the methods, and significant reductions in computational time; it can also be used to evaluate the performance of other prediction methods.

  5. Revisiting the blind tests in crystal structure prediction: accurate energy ranking of molecular crystals.

    PubMed

    Asmadi, Aldi; Neumann, Marcus A; Kendrick, John; Girard, Pascale; Perrin, Marc-Antoine; Leusen, Frank J J

    2009-12-24

    In the 2007 blind test of crystal structure prediction hosted by the Cambridge Crystallographic Data Centre (CCDC), a hybrid DFT/MM method correctly ranked each of the four experimental structures as having the lowest lattice energy of all the crystal structures predicted for each molecule. The work presented here further validates this hybrid method by optimizing the crystal structures (experimental and submitted) of the first three CCDC blind tests held in 1999, 2001, and 2004. Except for the crystal structures of compound IX, all structures were reminimized and ranked according to their lattice energies. The hybrid method computes the lattice energy of a crystal structure as the sum of the DFT total energy and a van der Waals (dispersion) energy correction. Considering all four blind tests, the crystal structure with the lowest lattice energy corresponds to the experimentally observed structure for 12 out of 14 molecules. Moreover, good geometrical agreement is observed between the structures determined by the hybrid method and those measured experimentally. In comparison with the correct submissions made by the blind test participants, all hybrid optimized crystal structures (apart from compound II) have the smallest calculated root mean squared deviations from the experimentally observed structures. It is predicted that a new polymorph of compound V exists under pressure.

  6. Robust and Accurate Modeling Approaches for Migraine Per-Patient Prediction from Ambulatory Data

    PubMed Central

    Pagán, Josué; Irene De Orbe, M.; Gago, Ana; Sobrado, Mónica; Risco-Martín, José L.; Vivancos Mora, J.; Moya, José M.; Ayala, José L.

    2015-01-01

    Migraine is one of the most wide-spread neurological disorders, and its medical treatment represents a high percentage of the costs of health systems. In some patients, characteristic symptoms that precede the headache appear. However, they are nonspecific, and their prediction horizon is unknown and pretty variable; hence, these symptoms are almost useless for prediction, and they are not useful to advance the intake of drugs to be effective and neutralize the pain. To solve this problem, this paper sets up a realistic monitoring scenario where hemodynamic variables from real patients are monitored in ambulatory conditions with a wireless body sensor network (WBSN). The acquired data are used to evaluate the predictive capabilities and robustness against noise and failures in sensors of several modeling approaches. The obtained results encourage the development of per-patient models based on state-space models (N4SID) that are capable of providing average forecast windows of 47 min and a low rate of false positives. PMID:26134103

  7. Accurate prediction of drug-induced liver injury using stem cell-derived populations.

    PubMed

    Szkolnicka, Dagmara; Farnworth, Sarah L; Lucendo-Villarin, Baltasar; Storck, Christopher; Zhou, Wenli; Iredale, John P; Flint, Oliver; Hay, David C

    2014-02-01

    Despite major progress in the knowledge and management of human liver injury, there are millions of people suffering from chronic liver disease. Currently, the only cure for end-stage liver disease is orthotopic liver transplantation; however, this approach is severely limited by organ donation. Alternative approaches to restoring liver function have therefore been pursued, including the use of somatic and stem cell populations. Although such approaches are essential in developing scalable treatments, there is also an imperative to develop predictive human systems that more effectively study and/or prevent the onset of liver disease and decompensated organ function. We used a renewable human stem cell resource, from defined genetic backgrounds, and drove them through developmental intermediates to yield highly active, drug-inducible, and predictive human hepatocyte populations. Most importantly, stem cell-derived hepatocytes displayed equivalence to primary adult hepatocytes, following incubation with known hepatotoxins. In summary, we have developed a serum-free, scalable, and shippable cell-based model that faithfully predicts the potential for human liver injury. Such a resource has direct application in human modeling and, in the future, could play an important role in developing renewable cell-based therapies.

  8. RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites.

    PubMed

    Luo, Jiesi; Liu, Liang; Venkateswaran, Suresh; Song, Qianqian; Zhou, Xiaobo

    2017-04-04

    RNA and protein interactions play crucial roles in multiple biological processes, while these interactions are significantly influenced by the structures and sequences of protein and RNA molecules. In this study, we first performed an analysis of RNA-protein interacting complexes, and identified interface properties of sequences and structures, which reveal the diverse nature of the binding sites. With the observations, we built a three-step prediction model, namely RPI-Bind, for the identification of RNA-protein binding regions using the sequences and structures of both proteins and RNAs. The three steps include 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions on both RNA and protein simultaneously, with the results from steps 1) and 2). Compared with existing methods, most of which employ only sequences, our model significantly improves the prediction accuracy at each of the three steps. Especially, our model outperforms the catRAPID by >20% at the 3(rd) step. All of these results indicate the importance of structures in RNA-protein interactions, and suggest that the RPI-Bind model is a powerful theoretical framework for studying RNA-protein interactions.

  9. CombFunc: predicting protein function using heterogeneous data sources.

    PubMed

    Wass, Mark N; Barton, Geraint; Sternberg, Michael J E

    2012-07-01

    Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein-protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained. CombFunc is available at http://www.sbg.bio.ic.ac.uk/combfunc.

  10. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    SciTech Connect

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  11. Structure prediction of magnetosome-associated proteins

    PubMed Central

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region – the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure–function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models’ overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  12. Can tritiated water-dilution space accurately predict total body water in chukar partridges

    SciTech Connect

    Crum, B.G.; Williams, J.B.; Nagy, K.A.

    1985-11-01

    Total body water (TBW) volumes determined from the dilution space of injected tritiated water have consistently overestimated actual water volumes (determined by desiccation to constant mass) in reptiles and mammals, but results for birds are controversial. We investigated potential errors in both the dilution method and the desiccation method in an attempt to resolve this controversy. Tritiated water dilution yielded an accurate measurement of water mass in vitro. However, in vivo, this method yielded a 4.6% overestimate of the amount of water (3.1% of live body mass) in chukar partridges, apparently largely because of loss of tritium from body water to sites of dissociable hydrogens on body solids. An additional source of overestimation (approximately 2% of body mass) was loss of tritium to the solids in blood samples during distillation of blood to obtain pure water for tritium analysis. Measuring tritium activity in plasma samples avoided this problem but required measurement of, and correction for, the dry matter content in plasma. Desiccation to constant mass by lyophilization or oven-drying also overestimated the amount of water actually in the bodies of chukar partridges by 1.4% of body mass, because these values included water adsorbed onto the outside of feathers. When desiccating defeathered carcasses, oven-drying at 70 degrees C yielded TBW values identical to those obtained from lyophilization, but TBW was overestimated (0.5% of body mass) by drying at 100 degrees C due to loss of organic substances as well as water.

  13. Does preoperative cross-sectional imaging accurately predict main duct involvement in intraductal papillary mucinous neoplasm?

    PubMed

    Barron, M R; Roch, A M; Waters, J A; Parikh, J A; DeWitt, J M; Al-Haddad, M A; Ceppa, E P; House, M G; Zyromski, N J; Nakeeb, A; Pitt, H A; Schmidt, C Max

    2014-03-01

    Main pancreatic duct (MPD) involvement is a well-demonstrated risk factor for malignancy in intraductal papillary mucinous neoplasm (IPMN). Preoperative radiographic determination of IPMN type is heavily relied upon in oncologic risk stratification. We hypothesized that radiographic assessment of MPD involvement in IPMN is an accurate predictor of pathological MPD involvement. Data regarding all patients undergoing resection for IPMN at a single academic institution between 1992 and 2012 were gathered prospectively. Retrospective analysis of imaging and pathologic data was undertaken. Preoperative classification of IPMN type was based on cross-sectional imaging (MRI/magnetic resonance cholangiopancreatography (MRCP) and/or CT). Three hundred sixty-two patients underwent resection for IPMN. Of these, 334 had complete data for analysis. Of 164 suspected branch duct (BD) IPMN, 34 (20.7%) demonstrated MPD involvement on final pathology. Of 170 patients with suspicion of MPD involvement, 50 (29.4%) demonstrated no MPD involvement. Of 34 patients with suspected BD-IPMN who were found to have MPD involvement on pathology, 10 (29.4%) had invasive carcinoma. Alternatively, 2/50 (4%) of the patients with suspected MPD involvement who ultimately had isolated BD-IPMN demonstrated invasive carcinoma. Preoperative radiographic IPMN type did not correlate with final pathology in 25% of the patients. In addition, risk of invasive carcinoma correlates with pathologic presence of MPD involvement.

  14. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    SciTech Connect

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-28

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  15. Size-extensivity-corrected multireference configuration interaction schemes to accurately predict bond dissociation energies of oxygenated hydrocarbons

    NASA Astrophysics Data System (ADS)

    Oyeyemi, Victor B.; Krisiloff, David B.; Keith, John A.; Libisch, Florian; Pavone, Michele; Carter, Emily A.

    2014-01-01

    Oxygenated hydrocarbons play important roles in combustion science as renewable fuels and additives, but many details about their combustion chemistry remain poorly understood. Although many methods exist for computing accurate electronic energies of molecules at equilibrium geometries, a consistent description of entire combustion reaction potential energy surfaces (PESs) requires multireference correlated wavefunction theories. Here we use bond dissociation energies (BDEs) as a foundational metric to benchmark methods based on multireference configuration interaction (MRCI) for several classes of oxygenated compounds (alcohols, aldehydes, carboxylic acids, and methyl esters). We compare results from multireference singles and doubles configuration interaction to those utilizing a posteriori and a priori size-extensivity corrections, benchmarked against experiment and coupled cluster theory. We demonstrate that size-extensivity corrections are necessary for chemically accurate BDE predictions even in relatively small molecules and furnish examples of unphysical BDE predictions resulting from using too-small orbital active spaces. We also outline the specific challenges in using MRCI methods for carbonyl-containing compounds. The resulting complete basis set extrapolated, size-extensivity-corrected MRCI scheme produces BDEs generally accurate to within 1 kcal/mol, laying the foundation for this scheme's use on larger molecules and for more complex regions of combustion PESs.

  16. Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.

    PubMed

    Maheshwari, Surabhi; Brylinski, Michal

    2015-01-01

    The identification of protein-protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system-level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein-protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSite(PPI) , an algorithm that uses the three-dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSite(PPI) in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40-0.43 (0.91-0.92) demonstrating that eFindSite(PPI) performs well not only using experimental data but also tolerates structural imperfections in computer-generated structures. In addition, eFindSite(PPI) detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSite(PPI) outperforms other methods for protein-binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large-scale applications using raw genomic data. eFindSite(PPI) is

  17. Computational methods toward accurate RNA structure prediction using coarse-grained and all-atom models.

    PubMed

    Krokhotin, Andrey; Dokholyan, Nikolay V

    2015-01-01

    Computational methods can provide significant insights into RNA structure and dynamics, bridging the gap in our understanding of the relationship between structure and biological function. Simulations enrich and enhance our understanding of data derived on the bench, as well as provide feasible alternatives to costly or technically challenging experiments. Coarse-grained computational models of RNA are especially important in this regard, as they allow analysis of events occurring in timescales relevant to RNA biological function, which are inaccessible through experimental methods alone. We have developed a three-bead coarse-grained model of RNA for discrete molecular dynamics simulations. This model is efficient in de novo prediction of short RNA tertiary structure, starting from RNA primary sequences of less than 50 nucleotides. To complement this model, we have incorporated additional base-pairing constraints and have developed a bias potential reliant on data obtained from hydroxyl probing experiments that guide RNA folding to its correct state. By introducing experimentally derived constraints to our computer simulations, we are able to make reliable predictions of RNA tertiary structures up to a few hundred nucleotides. Our refined model exemplifies a valuable benefit achieved through integration of computation and experimental methods.

  18. A novel neural response algorithm for protein function prediction

    PubMed Central

    2012-01-01

    Background Large amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction. Results We designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%. Conclusions The proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/. PMID:23046521

  19. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  20. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  1. How Accurate Is the Prediction of Maximal Oxygen Uptake with Treadmill Testing?

    PubMed Central

    Wicks, John R.; Oldridge, Neil B.

    2016-01-01

    Background Cardiorespiratory fitness measured by treadmill testing has prognostic significance in determining mortality with cardiovascular and other chronic disease states. The accuracy of a recently developed method for estimating maximal oxygen uptake (VO2peak), the heart rate index (HRI), is dependent only on heart rate (HR) and was tested against oxygen uptake (VO2), either measured or predicted from conventional treadmill parameters (speed, incline, protocol time). Methods The HRI equation, METs = 6 x HRI– 5, where HRI = maximal HR/resting HR, provides a surrogate measure of VO2peak. Forty large scale treadmill studies were identified through a systematic search using MEDLINE, Google Scholar and Web of Science in which VO2peak was either measured (TM-VO2meas; n = 20) or predicted (TM-VO2pred; n = 20) based on treadmill parameters. All studies were required to have reported group mean data of both resting and maximal HRs for determination of HR index-derived oxygen uptake (HRI-VO2). Results The 20 studies with measured VO2 (TM-VO2meas), involved 11,477 participants (median 337) with a total of 105,044 participants (median 3,736) in the 20 studies with predicted VO2 (TM-VO2pred). A difference of only 0.4% was seen between mean (±SD) VO2peak for TM- VO2meas and HRI-VO2 (6.51±2.25 METs and 6.54±2.28, respectively; p = 0.84). In contrast, there was a highly significant 21.1% difference between mean (±SD) TM-VO2pred and HRI-VO2 (8.12±1.85 METs and 6.71±1.92, respectively; p<0.001). Conclusion Although mean TM-VO2meas and HRI-VO2 were almost identical, mean TM-VO2pred was more than 20% greater than mean HRI-VO2. PMID:27875547

  2. Structure-Based Prediction of Unstable Regions in Proteins: Applications to Protein Misfolding Diseases

    NASA Astrophysics Data System (ADS)

    Guest, Will; Cashman, Neil; Plotkin, Steven

    2009-03-01

    Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and solution of the Poisson-Boltzmann equation. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.

  3. A Foundation for the Accurate Prediction of the Soft Error Vulnerability of Scientific Applications

    SciTech Connect

    Bronevetsky, G; de Supinski, B; Schulz, M

    2009-02-13

    Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientific applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. This paper analyzes the vulnerability of these routines to soft errors by characterizing how their outputs are affected by injected errors and by evaluating several techniques for predicting how errors propagate from the input to the output of each routine. The resulting error profiles can be used to understand the fault vulnerability of full applications that use these routines.

  4. Simplified versus geometrically accurate models of forefoot anatomy to predict plantar pressures: A finite element study.

    PubMed

    Telfer, Scott; Erdemir, Ahmet; Woodburn, James; Cavanagh, Peter R

    2016-01-25

    Integration of patient-specific biomechanical measurements into the design of therapeutic footwear has been shown to improve clinical outcomes in patients with diabetic foot disease. The addition of numerical simulations intended to optimise intervention design may help to build on these advances, however at present the time and labour required to generate and run personalised models of foot anatomy restrict their routine clinical utility. In this study we developed second-generation personalised simple finite element (FE) models of the forefoot with varying geometric fidelities. Plantar pressure predictions from barefoot, shod, and shod with insole simulations using simplified models were compared to those obtained from CT-based FE models incorporating more detailed representations of bone and tissue geometry. A simplified model including representations of metatarsals based on simple geometric shapes, embedded within a contoured soft tissue block with outer geometry acquired from a 3D surface scan was found to provide pressure predictions closest to the more complex model, with mean differences of 13.3kPa (SD 13.4), 12.52kPa (SD 11.9) and 9.6kPa (SD 9.3) for barefoot, shod, and insole conditions respectively. The simplified model design could be produced in <1h compared to >3h in the case of the more detailed model, and solved on average 24% faster. FE models of the forefoot based on simplified geometric representations of the metatarsal bones and soft tissue surface geometry from 3D surface scans may potentially provide a simulation approach with improved clinical utility, however further validity testing around a range of therapeutic footwear types is required.

  5. Development of a method to accurately calculate the Dpb and quickly predict the strength of a chemical bond

    NASA Astrophysics Data System (ADS)

    Du, Xia; Zhao, Dong-Xia; Yang, Zhong-Zhi

    2013-02-01

    A new approach to characterize and measure bond strength has been developed. First, we propose a method to accurately calculate the potential acting on an electron in a molecule (PAEM) at the saddle point along a chemical bond in situ, denoted by Dpb. Then, a direct method to quickly evaluate bond strength is established. We choose some familiar molecules as models for benchmarking this method. As a practical application, the Dpb of base pairs in DNA along C-H and N-H bonds are obtained for the first time. All results show that C7-H of A-T and C8-H of G-C are the relatively weak bonds that are the injured positions in DNA damage. The significance of this work is twofold: (i) A method is developed to calculate Dpb of various sizable molecules in situ quickly and accurately; (ii) This work demonstrates the feasibility to quickly predict the bond strength in macromolecules.

  6. Fast and accurate prediction for aerodynamic forces and moments acting on satellites flying in Low-Earth Orbit

    NASA Astrophysics Data System (ADS)

    Jin, Xuhon; Huang, Fei; Hu, Pengju; Cheng, Xiaoli

    2016-11-01

    A fundamental prerequisite for satellites operating in a Low Earth Orbit (LEO) is the availability of fast and accurate prediction of non-gravitational aerodynamic forces, which is characterised by the free molecular flow regime. However, conventional computational methods like the analytical integral method and direct simulation Monte Carlo (DSMC) technique are found failing to deal with flow shadowing and multiple reflections or computationally expensive. This work develops a general computer program for the accurate calculation of aerodynamic forces in the free molecular flow regime using the test particle Monte Carlo (TPMC) method, and non-gravitational aerodynamic forces actiong on the Gravity field and steady-state Ocean Circulation Explorer (GOCE) satellite is calculated for different freestream conditions and gas-surface interaction models by the computer program.

  7. CombFunc: predicting protein function using heterogeneous data sources

    PubMed Central

    Wass, Mark N.; Barton, Geraint; Sternberg, Michael J. E.

    2012-01-01

    Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained. CombFunc is available at http://www.sbg.bio.ic.ac.uk/combfunc. PMID:22641853

  8. Simplified risk score models accurately predict the risk of major in-hospital complications following percutaneous coronary intervention.

    PubMed

    Resnic, F S; Ohno-Machado, L; Selwyn, A; Simon, D I; Popma, J J

    2001-07-01

    The objectives of this analysis were to develop and validate simplified risk score models for predicting the risk of major in-hospital complications after percutaneous coronary intervention (PCI) in the era of widespread stenting and use of glycoprotein IIb/IIIa antagonists. We then sought to compare the performance of these simplified models with those of full logistic regression and neural network models. From January 1, 1997 to December 31, 1999, data were collected on 4,264 consecutive interventional procedures at a single center. Risk score models were derived from multiple logistic regression models using the first 2,804 cases and then validated on the final 1,460 cases. The area under the receiver operating characteristic (ROC) curve for the risk score model that predicted death was 0.86 compared with 0.85 for the multiple logistic model and 0.83 for the neural network model (validation set). For the combined end points of death, myocardial infarction, or bypass surgery, the corresponding areas under the ROC curves were 0.74, 0.78, and 0.81, respectively. Previously identified risk factors were confirmed in this analysis. The use of stents was associated with a decreased risk of in-hospital complications. Thus, risk score models can accurately predict the risk of major in-hospital complications after PCI. Their discriminatory power is comparable to those of logistic models and neural network models. Accurate bedside risk stratification may be achieved with these simple models.

  9. Protein function prediction using neighbor relativity in protein-protein interaction network.

    PubMed

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network.

  10. A comprehensive overview of computational protein disorder prediction methods†

    PubMed Central

    Deng, Xin; Eickholt, Jesse

    2013-01-01

    Over the past decade there has been a growing acknowledgement that a large proportion of proteins within most proteomes contain disordered regions. Disordered regions are segments of the protein chain which do not adopt a stable structure. Recognition of disordered regions in a protein is of great importance for protein structure prediction, protein structure determination and function annotation as these regions have a close relationship with protein expression and functionality. As a result, a great many protein disorder prediction methods have been developed so far. Here, we present an overview of current protein disorder prediction methods including an analysis of their advantages and shortcomings. In order to help users to select alternative tools under different circumstances, we also evaluate 23 disorder predictors on the benchmark data of the most recent round of the Critical Assessment of protein Structure Prediction (CASP) and assess their accuracy using several complementary measures. PMID:21874190

  11. Accurate prediction of the refractive index of polymers using first principles and data modeling

    NASA Astrophysics Data System (ADS)

    Afzal, Mohammad Atif Faiz; Cheng, Chong; Hachmann, Johannes

    Organic polymers with a high refractive index (RI) have recently attracted considerable interest due to their potential application in optical and optoelectronic devices. The ability to tailor the molecular structure of polymers is the key to increasing the accessible RI values. Our work concerns the creation of predictive in silico models for the optical properties of organic polymers, the screening of large-scale candidate libraries, and the mining of the resulting data to extract the underlying design principles that govern their performance. This work was set up to guide our experimentalist partners and allow them to target the most promising candidates. Our model is based on the Lorentz-Lorenz equation and thus includes the polarizability and number density values for each candidate. For the former, we performed a detailed benchmark study of different density functionals, basis sets, and the extrapolation scheme towards the polymer limit. For the number density we devised an exceedingly efficient machine learning approach to correlate the polymer structure and the packing fraction in the bulk material. We validated the proposed RI model against the experimentally known RI values of 112 polymers. We could show that the proposed combination of physical and data modeling is both successful and highly economical to characterize a wide range of organic polymers, which is a prerequisite for virtual high-throughput screening.

  12. The human skin/chick chorioallantoic membrane model accurately predicts the potency of cosmetic allergens.

    PubMed

    Slodownik, Dan; Grinberg, Igor; Spira, Ram M; Skornik, Yehuda; Goldstein, Ronald S

    2009-04-01

    The current standard method for predicting contact allergenicity is the murine local lymph node assay (LLNA). Public objection to the use of animals in testing of cosmetics makes the development of a system that does not use sentient animals highly desirable. The chorioallantoic membrane (CAM) of the chick egg has been extensively used for the growth of normal and transformed mammalian tissues. The CAM is not innervated, and embryos are sacrificed before the development of pain perception. The aim of this study was to determine whether the sensitization phase of contact dermatitis to known cosmetic allergens can be quantified using CAM-engrafted human skin and how these results compare with published EC3 data obtained with the LLNA. We studied six common molecules used in allergen testing and quantified migration of epidermal Langerhans cells (LC) as a measure of their allergic potency. All agents with known allergic potential induced statistically significant migration of LC. The data obtained correlated well with published data for these allergens generated using the LLNA test. The human-skin CAM model therefore has great potential as an inexpensive, non-radioactive, in vivo alternative to the LLNA, which does not require the use of sentient animals. In addition, this system has the advantage of testing the allergic response of human, rather than animal skin.

  13. Searching for Computational Strategies to Accurately Predict pKas of Large Phenolic Derivatives.

    PubMed

    Rebollar-Zepeda, Aida Mariana; Campos-Hernández, Tania; Ramírez-Silva, María Teresa; Rojas-Hernández, Alberto; Galano, Annia

    2011-08-09

    Twenty-two reaction schemes have been tested, within the cluster-continuum model including up to seven explicit water molecules. They have been used in conjunction with nine different methods, within the density functional theory and with second-order Møller-Plesset. The quality of the pKa predictions was found to be strongly dependent on the chosen scheme, while only moderately influenced by the method of calculation. We recommend the E1 reaction scheme [HA + OH(-) (3H2O) ↔ A(-) (H2O) + 3H2O], since it yields mean unsigned errors (MUE) lower than 1 unit of pKa for most of the tested functionals. The best pKa values obtained from this reaction scheme are those involving calculations with PBE0 (MUE = 0.77), TPSS (MUE = 0.82), BHandHLYP (MUE = 0.82), and B3LYP (MUE = 0.86) functionals. This scheme has the additional advantage, compared to the proton exchange method, which also gives very small values of MUE, of being experiment independent. It should be kept in mind, however, that these recommendations are valid within the cluster-continuum model, using the polarizable continuum model in conjunction with the united atom Hartree-Fock cavity and the strategy based on thermodynamic cycles. Changes in any of these aspects of the used methodology may lead to different outcomes.

  14. Towards Relaxing the Spherical Solar Radiation Pressure Model for Accurate Orbit Predictions

    NASA Astrophysics Data System (ADS)

    Lachut, M.; Bennett, J.

    2016-09-01

    The well-known cannonball model has been used ubiquitously to capture the effects of atmospheric drag and solar radiation pressure on satellites and/or space debris for decades. While it lends itself naturally to spherical objects, its validity in the case of non-spherical objects has been debated heavily for years throughout the space situational awareness community. One of the leading motivations to improve orbit predictions by relaxing the spherical assumption, is the ongoing demand for more robust and reliable conjunction assessments. In this study, we explore the orbit propagation of a flat plate in a near-GEO orbit under the influence of solar radiation pressure, using a Lambertian BRDF model. Consequently, this approach will account for the spin rate and orientation of the object, which is typically determined in practice using a light curve analysis. Here, simulations will be performed which systematically reduces the spin rate to demonstrate the point at which the spherical model no longer describes the orbital elements of the spinning plate. Further understanding of this threshold would provide insight into when a higher fidelity model should be used, thus resulting in improved orbit propagations. Therefore, the work presented here is of particular interest to organizations and researchers that maintain their own catalog, and/or perform conjunction analyses.

  15. Towards Accurate Prediction of Turbulent, Three-Dimensional, Recirculating Flows with the NCC

    NASA Technical Reports Server (NTRS)

    Iannetti, A.; Tacina, R.; Jeng, S.-M.; Cai, J.

    2001-01-01

    The National Combustion Code (NCC) was used to calculate the steady state, nonreacting flow field of a prototype Lean Direct Injection (LDI) swirler. This configuration used nine groups of eight holes drilled at a thirty-five degree angle to induce swirl. These nine groups created swirl in the same direction, or a corotating pattern. The static pressure drop across the holes was fixed at approximately four percent. Computations were performed on one quarter of the geometry, because the geometry is considered rotationally periodic every ninety degrees. The final computational grid used was approximately 2.26 million tetrahedral cells, and a cubic nonlinear k - epsilon model was used to model turbulence. The NCC results were then compared to time averaged Laser Doppler Velocimetry (LDV) data. The LDV measurements were performed on the full geometry, but four ninths of the geometry was measured. One-, two-, and three-dimensional representations of both flow fields are presented. The NCC computations compare both qualitatively and quantitatively well to the LDV data, but differences exist downstream. The comparison is encouraging, and shows that NCC can be used for future injector design studies. To improve the flow prediction accuracy of turbulent, three-dimensional, recirculating flow fields with the NCC, recommendations are given.

  16. The development and verification of a highly accurate collision prediction model for automated noncoplanar plan delivery

    SciTech Connect

    Yu, Victoria Y.; Tran, Angelia; Nguyen, Dan; Cao, Minsong; Ruan, Dan; Low, Daniel A.; Sheng, Ke

    2015-11-15

    attributed to phantom setup errors due to the slightly deformable and flexible phantom extremities. The estimated site-specific safety buffer distance with 0.001% probability of collision for (gantry-to-couch, gantry-to-phantom) was (1.23 cm, 3.35 cm), (1.01 cm, 3.99 cm), and (2.19 cm, 5.73 cm) for treatment to the head, lung, and prostate, respectively. Automated delivery to all three treatment sites was completed in 15 min and collision free using a digital Linac. Conclusions: An individualized collision prediction model for the purpose of noncoplanar beam delivery was developed and verified. With the model, the study has demonstrated the feasibility of predicting deliverable beams for an individual patient and then guiding fully automated noncoplanar treatment delivery. This work motivates development of clinical workflows and quality assurance procedures to allow more extensive use and automation of noncoplanar beam geometries.

  17. The development and verification of a highly accurate collision prediction model for automated noncoplanar plan delivery

    PubMed Central

    Yu, Victoria Y.; Tran, Angelia; Nguyen, Dan; Cao, Minsong; Ruan, Dan; Low, Daniel A.; Sheng, Ke

    2015-01-01

    attributed to phantom setup errors due to the slightly deformable and flexible phantom extremities. The estimated site-specific safety buffer distance with 0.001% probability of collision for (gantry-to-couch, gantry-to-phantom) was (1.23 cm, 3.35 cm), (1.01 cm, 3.99 cm), and (2.19 cm, 5.73 cm) for treatment to the head, lung, and prostate, respectively. Automated delivery to all three treatment sites was completed in 15 min and collision free using a digital Linac. Conclusions: An individualized collision prediction model for the purpose of noncoplanar beam delivery was developed and verified. With the model, the study has demonstrated the feasibility of predicting deliverable beams for an individual patient and then guiding fully automated noncoplanar treatment delivery. This work motivates development of clinical workflows and quality assurance procedures to allow more extensive use and automation of noncoplanar beam geometries. PMID:26520735

  18. How Accurate Are the Anthropometry Equations in in Iranian Military Men in Predicting Body Composition?

    PubMed Central

    Shakibaee, Abolfazl; Faghihzadeh, Soghrat; Alishiri, Gholam Hossein; Ebrahimpour, Zeynab; Faradjzadeh, Shahram; Sobhani, Vahid; Asgari, Alireza

    2015-01-01

    Background: The body composition varies according to different life styles (i.e. intake calories and caloric expenditure). Therefore, it is wise to record military personnel’s body composition periodically and encourage those who abide to the regulations. Different methods have been introduced for body composition assessment: invasive and non-invasive. Amongst them, the Jackson and Pollock equation is most popular. Objectives: The recommended anthropometric prediction equations for assessing men’s body composition were compared with dual-energy X-ray absorptiometry (DEXA) gold standard to develop a modified equation to assess body composition and obesity quantitatively among Iranian military men. Patients and Methods: A total of 101 military men aged 23 - 52 years old with a mean age of 35.5 years were recruited and evaluated in the present study (average height, 173.9 cm and weight, 81.5 kg). The body-fat percentages of subjects were assessed both with anthropometric assessment and DEXA scan. The data obtained from these two methods were then compared using multiple regression analysis. Results: The mean and standard deviation of body fat percentage of the DEXA assessment was 21.2 ± 4.3 and body fat percentage obtained from three Jackson and Pollock 3-, 4- and 7-site equations were 21.1 ± 5.8, 22.2 ± 6.0 and 20.9 ± 5.7, respectively. There was a strong correlation between these three equations and DEXA (R² = 0.98). Conclusions: The mean percentage of body fat obtained from the three equations of Jackson and Pollock was very close to that of body fat obtained from DEXA; however, we suggest using a modified Jackson-Pollock 3-site equation for volunteer military men because the 3-site equation analysis method is simpler and faster than other methods. PMID:26715964

  19. Deformation, Failure, and Fatigue Life of SiC/Ti-15-3 Laminates Accurately Predicted by MAC/GMC

    NASA Technical Reports Server (NTRS)

    Bednarcyk, Brett A.; Arnold, Steven M.

    2002-01-01

    NASA Glenn Research Center's Micromechanics Analysis Code with Generalized Method of Cells (MAC/GMC) (ref.1) has been extended to enable fully coupled macro-micro deformation, failure, and fatigue life predictions for advanced metal matrix, ceramic matrix, and polymer matrix composites. Because of the multiaxial nature of the code's underlying micromechanics model, GMC--which allows the incorporation of complex local inelastic constitutive models--MAC/GMC finds its most important application in metal matrix composites, like the SiC/Ti-15-3 composite examined here. Furthermore, since GMC predicts the microscale fields within each constituent of the composite material, submodels for local effects such as fiber breakage, interfacial debonding, and matrix fatigue damage can and have been built into MAC/GMC. The present application of MAC/GMC highlights the combination of these features, which has enabled the accurate modeling of the deformation, failure, and life of titanium matrix composites.

  20. Industrial Compositional Streamline Simulation for Efficient and Accurate Prediction of Gas Injection and WAG Processes

    SciTech Connect

    Margot Gerritsen

    2008-10-31

    Gas-injection processes are widely and increasingly used for enhanced oil recovery (EOR). In the United States, for example, EOR production by gas injection accounts for approximately 45% of total EOR production and has tripled since 1986. The understanding of the multiphase, multicomponent flow taking place in any displacement process is essential for successful design of gas-injection projects. Due to complex reservoir geometry, reservoir fluid properties and phase behavior, the design of accurate and efficient numerical simulations for the multiphase, multicomponent flow governing these processes is nontrivial. In this work, we developed, implemented and tested a streamline based solver for gas injection processes that is computationally very attractive: as compared to traditional Eulerian solvers in use by industry it computes solutions with a computational speed orders of magnitude higher and a comparable accuracy provided that cross-flow effects do not dominate. We contributed to the development of compositional streamline solvers in three significant ways: improvement of the overall framework allowing improved streamline coverage and partial streamline tracing, amongst others; parallelization of the streamline code, which significantly improves wall clock time; and development of new compositional solvers that can be implemented along streamlines as well as in existing Eulerian codes used by industry. We designed several novel ideas in the streamline framework. First, we developed an adaptive streamline coverage algorithm. Adding streamlines locally can reduce computational costs by concentrating computational efforts where needed, and reduce mapping errors. Adapting streamline coverage effectively controls mass balance errors that mostly result from the mapping from streamlines to pressure grid. We also introduced the concept of partial streamlines: streamlines that do not necessarily start and/or end at wells. This allows more efficient coverage and avoids

  1. Easy-to-use, general, and accurate multi-Kinect calibration and its application to gait monitoring for fall prediction.

    PubMed

    Staranowicz, Aaron N; Ray, Christopher; Mariottini, Gian-Luca

    2015-01-01

    Falls are the most-common causes of unintentional injury and death in older adults. Many clinics, hospitals, and health-care providers are urgently seeking accurate, low-cost, and easy-to-use technology to predict falls before they happen, e.g., by monitoring the human walking pattern (or "gait"). Despite the wide popularity of Microsoft's Kinect and the plethora of solutions for gait monitoring, no strategy has been proposed to date to allow non-expert users to calibrate the cameras, which is essential to accurately fuse the body motion observed by each camera in a single frame of reference. In this paper, we present a novel multi-Kinect calibration algorithm that has advanced features when compared to existing methods: 1) is easy to use, 2) it can be used in any generic Kinect arrangement, and 3) it provides accurate calibration. Extensive real-world experiments have been conducted to validate our algorithm and to compare its performance against other multi-Kinect calibration approaches, especially to show the improved estimate of gait parameters. Finally, a MATLAB Toolbox has been made publicly available for the entire research community.

  2. Absolute Measurements of Macrophage Migration Inhibitory Factor and Interleukin-1-β mRNA Levels Accurately Predict Treatment Response in Depressed Patients

    PubMed Central

    Ferrari, Clarissa; Uher, Rudolf; Bocchio-Chiavetto, Luisella; Riva, Marco Andrea; Pariante, Carmine M.

    2016-01-01

    Background: Increased levels of inflammation have been associated with a poorer response to antidepressants in several clinical samples, but these findings have had been limited by low reproducibility of biomarker assays across laboratories, difficulty in predicting response probability on an individual basis, and unclear molecular mechanisms. Methods: Here we measured absolute mRNA values (a reliable quantitation of number of molecules) of Macrophage Migration Inhibitory Factor and interleukin-1β in a previously published sample from a randomized controlled trial comparing escitalopram vs nortriptyline (GENDEP) as well as in an independent, naturalistic replication sample. We then used linear discriminant analysis to calculate mRNA values cutoffs that best discriminated between responders and nonresponders after 12 weeks of antidepressants. As Macrophage Migration Inhibitory Factor and interleukin-1β might be involved in different pathways, we constructed a protein-protein interaction network by the Search Tool for the Retrieval of Interacting Genes/Proteins. Results: We identified cutoff values for the absolute mRNA measures that accurately predicted response probability on an individual basis, with positive predictive values and specificity for nonresponders of 100% in both samples (negative predictive value=82% to 85%, sensitivity=52% to 61%). Using network analysis, we identified different clusters of targets for these 2 cytokines, with Macrophage Migration Inhibitory Factor interacting predominantly with pathways involved in neurogenesis, neuroplasticity, and cell proliferation, and interleukin-1β interacting predominantly with pathways involved in the inflammasome complex, oxidative stress, and neurodegeneration. Conclusion: We believe that these data provide a clinically suitable approach to the personalization of antidepressant therapy: patients who have absolute mRNA values above the suggested cutoffs could be directed toward earlier access to more

  3. Construction of ontology augmented networks for protein complex prediction.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  4. A cross-race effect in metamemory: Predictions of face recognition are more accurate for members of our own race

    PubMed Central

    Hourihan, Kathleen L.; Benjamin, Aaron S.; Liu, Xiping

    2012-01-01

    The Cross-Race Effect (CRE) in face recognition is the well-replicated finding that people are better at recognizing faces from their own race, relative to other races. The CRE reveals systematic limitations on eyewitness identification accuracy and suggests that some caution is warranted in evaluating cross-race identification. The CRE is a problem because jurors value eyewitness identification highly in verdict decisions. In the present paper, we explore how accurate people are in predicting their ability to recognize own-race and other-race faces. Caucasian and Asian participants viewed photographs of Caucasian and Asian faces, and made immediate judgments of learning during study. An old/new recognition test replicated the CRE: both groups displayed superior discriminability of own-race faces, relative to other-race faces. Importantly, relative metamnemonic accuracy was also greater for own-race faces, indicating that the accuracy of predictions about face recognition is influenced by race. This result indicates another source of concern when eliciting or evaluating eyewitness identification: people are less accurate in judging whether they will or will not recognize a face when that face is of a different race than they are. This new result suggests that a witness’s claim of being likely to recognize a suspect from a lineup should be interpreted with caution when the suspect is of a different race than the witness. PMID:23162788

  5. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed.

  6. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases

    PubMed Central

    Floden, Evan W.; Tommaso, Paolo D.; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-01-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  7. Shrinking the Psoriasis Assessment Gap: Early Gene-Expression Profiling Accurately Predicts Response to Long-Term Treatment.

    PubMed

    Correa da Rosa, Joel; Kim, Jaehwan; Tian, Suyan; Tomalin, Lewis E; Krueger, James G; Suárez-Fariñas, Mayte

    2017-02-01

    There is an "assessment gap" between the moment a patient's response to treatment is biologically determined and when a response can actually be determined clinically. Patients' biochemical profiles are a major determinant of clinical outcome for a given treatment. It is therefore feasible that molecular-level patient information could be used to decrease the assessment gap. Thanks to clinically accessible biopsy samples, high-quality molecular data for psoriasis patients are widely available. Psoriasis is therefore an excellent disease for testing the prospect of predicting treatment outcome from molecular data. Our study shows that gene-expression profiles of psoriasis skin lesions, taken in the first 4 weeks of treatment, can be used to accurately predict (>80% area under the receiver operating characteristic curve) the clinical endpoint at 12 weeks. This could decrease the psoriasis assessment gap by 2 months. We present two distinct prediction modes: a universal predictor, aimed at forecasting the efficacy of untested drugs, and specific predictors aimed at forecasting clinical response to treatment with four specific drugs: etanercept, ustekinumab, adalimumab, and methotrexate. We also develop two forms of prediction: one from detailed, platform-specific data and one from platform-independent, pathway-based data. We show that key biomarkers are associated with responses to drugs and doses and thus provide insight into the biology of pathogenesis reversion.

  8. Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock

    NASA Astrophysics Data System (ADS)

    Cleves, Ann E.; Jain, Ajay N.

    2015-06-01

    Prediction of the bound configuration of small-molecule ligands that differ substantially from the cognate ligand of a protein co-crystal structure is much more challenging than re-docking the cognate ligand. Success rates for cross-docking in the range of 20-30 % are common. We present an approach that uses structural information known prior to a particular cutoff-date to make predictions on ligands whose bounds structures were determined later. The knowledge-guided docking protocol was tested on a set of ten protein targets using a total of 949 ligands. The benchmark data set, called PINC ("PINC Is Not Cognate"), is publicly available. Protein pocket similarity was used to choose representative structures for ensemble-docking. The docking protocol made use of known ligand poses prior to the cutoff-date, both to help guide the configurational search and to adjust the rank of predicted poses. Overall, the top-scoring pose family was correct over 60 % of the time, with the top-two pose families approaching a 75 % success rate. Correct poses among all those predicted were identified nearly 90 % of the time. The largest improvements came from the use of molecular similarity to improve ligand pose rankings and the strategy for identifying representative protein structures. With the exception of a single outlier target, the knowledge-guided docking protocol produced results matching the quality of cognate-ligand re-docking, but it did so on a very challenging temporally-segregated cross-docking benchmark.

  9. Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock.

    PubMed

    Cleves, Ann E; Jain, Ajay N

    2015-06-01

    Prediction of the bound configuration of small-molecule ligands that differ substantially from the cognate ligand of a protein co-crystal structure is much more challenging than re-docking the cognate ligand. Success rates for cross-docking in the range of 20-30 % are common. We present an approach that uses structural information known prior to a particular cutoff-date to make predictions on ligands whose bounds structures were determined later. The knowledge-guided docking protocol was tested on a set of ten protein targets using a total of 949 ligands. The benchmark data set, called PINC ("PINC Is Not Cognate"), is publicly available. Protein pocket similarity was used to choose representative structures for ensemble-docking. The docking protocol made use of known ligand poses prior to the cutoff-date, both to help guide the configurational search and to adjust the rank of predicted poses. Overall, the top-scoring pose family was correct over 60 % of the time, with the top-two pose families approaching a 75 % success rate. Correct poses among all those predicted were identified nearly 90 % of the time. The largest improvements came from the use of molecular similarity to improve ligand pose rankings and the strategy for identifying representative protein structures. With the exception of a single outlier target, the knowledge-guided docking protocol produced results matching the quality of cognate-ligand re-docking, but it did so on a very challenging temporally-segregated cross-docking benchmark.

  10. Identification of Extracellular Segments by Mass Spectrometry Improves Topology Prediction of Transmembrane Proteins

    PubMed Central

    Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G.; Medzihradszky, Katalin F.; Szakács, Gergely; Tusnády, Gábor E.

    2017-01-01

    Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins. PMID:28211907

  11. Identification of Extracellular Segments by Mass Spectrometry Improves Topology Prediction of Transmembrane Proteins.

    PubMed

    Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G; Medzihradszky, Katalin F; Szakács, Gergely; Tusnády, Gábor E

    2017-02-13

    Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.

  12. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  13. Prediction of DNA-binding proteins from relational features

    PubMed Central

    2012-01-01

    Background The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins. Results Prediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins. Conclusions We introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general. PMID:23146001

  14. Essential protein identification based on essential protein-protein interaction prediction by Integrated Edge Weights.

    PubMed

    Jiang, Yuexu; Wang, Yan; Pang, Wei; Chen, Liang; Sun, Huiyan; Liang, Yanchun; Blanzieri, Enrico

    2015-07-15

    Essential proteins play a crucial role in cellular survival and development process. Experimentally, essential proteins are identified by gene knockouts or RNA interference, which are expensive and often fatal to the target organisms. Regarding this, an alternative yet important approach to essential protein identification is through computational prediction. Existing computational methods predict essential proteins based on their relative densities in a protein-protein interaction (PPI) network. Degree, betweenness, and other appropriate criteria are often used to measure the relative density. However, no matter what criterion is used, a protein is actually ordered by the attributes of this protein per se. In this research, we presented a novel computational method, Integrated Edge Weights (IEW), to first rank protein-protein interactions by integrating their edge weights, and then identified sub PPI networks consisting of those highly-ranked edges, and finally regarded the nodes in these sub networks as essential proteins. We evaluated IEW on three model organisms: Saccharomyces cerevisiae (S. cerevisiae), Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegans). The experimental results showed that IEW achieved better performance than the state-of-the-art methods in terms of precision-recall and Jackknife measures. We had also demonstrated that IEW is a robust and effective method, which can retrieve biologically significant modules by its highly-ranked protein-protein interactions for S. cerevisiae, E. coli, and C. elegans. We believe that, with sufficient data provided, IEW can be used to any other organisms' essential protein identification. A website about IEW can be accessed from http://digbio.missouri.edu/IEW/index.html.

  15. Accurate prediction of unsteady and time-averaged pressure loads using a hybrid Reynolds-Averaged/large-eddy simulation technique

    NASA Astrophysics Data System (ADS)

    Bozinoski, Radoslav

    Significant research has been performed over the last several years on understanding the unsteady aerodynamics of various fluid flows. Much of this work has focused on quantifying the unsteady, three-dimensional flow field effects which have proven vital to the accurate prediction of many fluid and aerodynamic problems. Up until recently, engineers have predominantly relied on steady-state simulations to analyze the inherently three-dimensional ow structures that are prevalent in many of today's "real-world" problems. Increases in computational capacity and the development of efficient numerical methods can change this and allow for the solution of the unsteady Reynolds-Averaged Navier-Stokes (RANS) equations for practical three-dimensional aerodynamic applications. An integral part of this capability has been the performance and accuracy of the turbulence models coupled with advanced parallel computing techniques. This report begins with a brief literature survey of the role fully three-dimensional, unsteady, Navier-Stokes solvers have on the current state of numerical analysis. Next, the process of creating a baseline three-dimensional Multi-Block FLOw procedure called MBFLO3 is presented. Solutions for an inviscid circular arc bump, laminar at plate, laminar cylinder, and turbulent at plate are then presented. Results show good agreement with available experimental, numerical, and theoretical data. Scalability data for the parallel version of MBFLO3 is presented and shows efficiencies of 90% and higher for processes of no less than 100,000 computational grid points. Next, the description and implementation techniques used for several turbulence models are presented. Following the successful implementation of the URANS and DES procedures, the validation data for separated, non-reattaching flows over a NACA 0012 airfoil, wall-mounted hump, and a wing-body junction geometry are presented. Results for the NACA 0012 showed significant improvement in flow predictions

  16. Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies

    PubMed Central

    Atkins, Jennifer D.; Boateng, Samuel Y.; Sorensen, Thomas; McGuffin, Liam J.

    2015-01-01

    The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution. PMID:26287166

  17. Accurate design of co-assembling multi-component protein nanomaterials.

    PubMed

    King, Neil P; Bale, Jacob B; Sheffler, William; McNamara, Dan E; Gonen, Shane; Gonen, Tamir; Yeates, Todd O; Baker, David

    2014-06-05

    The self-assembly of proteins into highly ordered nanoscale architectures is a hallmark of biological systems. The sophisticated functions of these molecular machines have inspired the development of methods to engineer self-assembling protein nanostructures; however, the design of multi-component protein nanomaterials with high accuracy remains an outstanding challenge. Here we report a computational method for designing protein nanomaterials in which multiple copies of two distinct subunits co-assemble into a specific architecture. We use the method to design five 24-subunit cage-like protein nanomaterials in two distinct symmetric architectures and experimentally demonstrate that their structures are in close agreement with the computational design models. The accuracy of the method and the number and variety of two-component materials that it makes accessible suggest a route to the construction of functional protein nanomaterials tailored to specific applications.

  18. Protein corona fingerprinting predicts the cellular interaction of gold and silver nanoparticles.

    PubMed

    Walkey, Carl D; Olsen, Jonathan B; Song, Fayi; Liu, Rong; Guo, Hongbo; Olsen, D Wesley H; Cohen, Yoram; Emili, Andrew; Chan, Warren C W

    2014-03-25

    Using quantitative models to predict the biological interactions of nanoparticles will accelerate the translation of nanotechnology. Here, we characterized the serum protein corona 'fingerprint' formed around a library of 105 surface-modified gold nanoparticles. Applying a bioinformatics-inspired approach, we developed a multivariate model that uses the protein corona fingerprint to predict cell association 50% more accurately than a model that uses parameters describing nanoparticle size, aggregation state, and surface charge. Our model implicates a set of hyaluronan-binding proteins as mediators of nanoparticle-cell interactions. This study establishes a framework for developing a comprehensive database of protein corona fingerprints and biological responses for multiple nanoparticle types. Such a database can be used to develop quantitative relationships that predict the biological responses to nanoparticles and will aid in uncovering the fundamental mechanisms of nano-bio interactions.

  19. Accurate prediction of polarised high order electrostatic interactions for hydrogen bonded complexes using the machine learning method kriging

    NASA Astrophysics Data System (ADS)

    Hughes, Timothy J.; Kandathil, Shaun M.; Popelier, Paul L. A.

    2015-02-01

    As intermolecular interactions such as the hydrogen bond are electrostatic in origin, rigorous treatment of this term within force field methodologies should be mandatory. We present a method able of accurately reproducing such interactions for seven van der Waals complexes. It uses atomic multipole moments up to hexadecupole moment mapped to the positions of the nuclear coordinates by the machine learning method kriging. Models were built at three levels of theory: HF/6-31G**, B3LYP/aug-cc-pVDZ and M06-2X/aug-cc-pVDZ. The quality of the kriging models was measured by their ability to predict the electrostatic interaction energy between atoms in external test examples for which the true energies are known. At all levels of theory, >90% of test cases for small van der Waals complexes were predicted within 1 kJ mol-1, decreasing to 60-70% of test cases for larger base pair complexes. Models built on moments obtained at B3LYP and M06-2X level generally outperformed those at HF level. For all systems the individual interactions were predicted with a mean unsigned error of less than 1 kJ mol-1.

  20. Accurate prediction of polarised high order electrostatic interactions for hydrogen bonded complexes using the machine learning method kriging.

    PubMed

    Hughes, Timothy J; Kandathil, Shaun M; Popelier, Paul L A

    2015-02-05

    As intermolecular interactions such as the hydrogen bond are electrostatic in origin, rigorous treatment of this term within force field methodologies should be mandatory. We present a method able of accurately reproducing such interactions for seven van der Waals complexes. It uses atomic multipole moments up to hexadecupole moment mapped to the positions of the nuclear coordinates by the machine learning method kriging. Models were built at three levels of theory: HF/6-31G(**), B3LYP/aug-cc-pVDZ and M06-2X/aug-cc-pVDZ. The quality of the kriging models was measured by their ability to predict the electrostatic interaction energy between atoms in external test examples for which the true energies are known. At all levels of theory, >90% of test cases for small van der Waals complexes were predicted within 1 kJ mol(-1), decreasing to 60-70% of test cases for larger base pair complexes. Models built on moments obtained at B3LYP and M06-2X level generally outperformed those at HF level. For all systems the individual interactions were predicted with a mean unsigned error of less than 1 kJ mol(-1).

  1. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions.

    PubMed

    Gao, Mu; Skolnick, Jeffrey

    2008-07-01

    The structures of DNA-protein complexes have illuminated the diversity of DNA-protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA-protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on approximately 4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to approximately 1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.

  2. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  3. OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method.

    PubMed

    Zhang, Li; Wang, Han; Yan, Lun; Su, Lingtao; Xu, Dong

    2017-03-01

    In the two transmembrane protein types, outer membrane proteins (OMPs) perform diverse important biochemical functions, including substrate transport and passive nutrient uptake and intake. Hence their 3D structures are expected to reveal these functions. Because experimental structures are scarce, predicted 3D structures are more adapted to OMP research instead, and the inter-barrel residue contact is becoming one of the most remarkable features, improving prediction accuracy by describing the structural information of OMPs. To predict OMP structures accurately, we explored an OMP inter-barrel residue contact prediction method: OMPcontact. Multiple OMP-specific features were integrated in the method, including residue evolutionary covariation, topology-based transmembrane segment relative residue position, OMP lipid layer accessibility, and residue evolution conservation. These features describe the properties of a residue pair in different respects: sequential, structural, evolutionary, and biochemical. Within a 3-residues slide window, a Support Vector Machine (SVM) could accurately determinate the inter-barrel contact residue pair using above features. A 5-fold cross-valuation process was applied in testing the OMPcontact performance against a non-redundant OMP set with 75 samples inside. The tests compared four evolutionary covariation methods and screen analyzed the adaptive ones for inter-barrel contact prediction. The results showed our method not only efficiently realized the prediction, but also scored the possibility for residue pairs reliably. This is expected to improve OMP tertiary structure prediction. Therefore, OMPcontact will be helpful in compiling a structural census of outer membrane protein.

  4. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

    PubMed Central

    Hayat, Sikander; Sander, Chris; Marks, Debora S.

    2015-01-01

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases. PMID:25858953

  5. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  6. Predicting the orientation of protein G B1 on hydrophobic surfaces using Monte Carlo simulations

    PubMed Central

    Harrison, Elisa T.; Weidner, Tobias; Castner, David G.; Interlandi, Gianluca

    2016-01-01

    A Monte Carlo algorithm was developed to predict the most likely orientations of protein G B1, an immunoglobulin G (IgG) antibody-binding domain of protein G, adsorbed onto a hydrophobic surface. At each Monte Carlo step, the protein was rotated and translated as a rigid body. The assumption about rigidity was supported by quartz crystal microbalance with dissipation monitoring experiments, which indicated that protein G B1 adsorbed on a polystyrene surface with its native structure conserved and showed that its IgG antibody-binding activity was retained. The Monte Carlo simulations predicted that protein G B1 is likely adsorbed onto a hydrophobic surface in two different orientations, characterized as two mutually exclusive sets of amino acids contacting the surface. This was consistent with sum frequency generation (SFG) vibrational spectroscopy results. In fact, theoretical SFG spectra calculated from an equal combination of the two predicted orientations exhibited reasonable agreement with measured spectra of protein G B1 on polystyrene surfaces. Also, in explicit solvent molecular dynamics simulations, protein G B1 maintained its predicted orientation in three out of four runs. This work shows that using a Monte Carlo approach can provide an accurate estimate of a protein orientation on a hydrophobic surface, which complements experimental surface analysis techniques and provides an initial system to study the interaction between a protein and a surface in molecular dynamics simulations. PMID:27923271

  7. Investigation and prediction of protein precipitation by polyethylene glycol using quantitative structure-activity relationship models.

    PubMed

    Hämmerling, Frank; Ladd Effio, Christopher; Andris, Sebastian; Kittelmann, Jörg; Hubbuch, Jürgen

    2017-01-10

    Precipitation of proteins is considered to be an effective purification method for proteins and has proven its potential to replace costly chromatography processes. Besides salts and polyelectrolytes, polymers, such as polyethylene glycol (PEG), are commonly used for precipitation applications under mild conditions. Process development, however, for protein precipitation steps still is based mainly on heuristic approaches and high-throughput experimentation due to a lack of understanding of the underlying mechanisms. In this work we apply quantitative structure-activity relationships (QSARs) to model two parameters, the discontinuity point m* and the β-value, that describe the complete precipitation curve of a protein under defined conditions. The generated QSAR models are sensitive to the protein type, pH, and ionic strength. It was found that the discontinuity point m* is mainly dependent on protein molecular structure properties and electrostatic surface properties, whereas the β-value is influenced by the variance in electrostatics and hydrophobicity on the protein surface. The models for m* and the β-value exhibit a good correlation between observed and predicted data with a coefficient of determination of R(2)≥0.90 and, hence, are able to accurately predict precipitation curves for proteins. The predictive capabilities were demonstrated for a set of combinations of protein type, pH, and ionic strength not included in the generation of the models and good agreement between predicted and experimental data was achieved.

  8. A survey of computational intelligence techniques in protein function prediction.

    PubMed

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.

  9. ENZPRED-enzymatic protein class predicting by machine learning.

    PubMed

    Dave, Kirtan; Panchal, Hetalkumar

    2013-01-01

    Recent times have seen flooding of biological data into the scientific community. Due to increase in large amounts of data from genome and other sequencing projects become available, being diverted on to Insilco approach for data collection and prediction has become a priority also progresses in sequencing technologies have found an exponential function rise in the number of newly found enzymes. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. As new approaches are needed to determine the functions of the proteins these genes encode. The protein parameters that can be used for an enzyme/ non-enzyme classification includes features of sequences like amino acid composition, dipeptide composition, grand Average of hydropathicity (GRAVY), probability of being in alpha helix, probability of being in beta sheet Probability of being in a turn. We show how large-scale computational analysis can help to address this challenge by help of java and support vector machine library. In this paper, a recently developed machine learning algorithm referred to as the svm library Learning Machine is used to classify protein sequences with six main classes of enzyme data downloaded from a public domain database. Comparative studies on different type of kernel methods like 1.radial basis function, 2.polynomial available in SVM library. Results show that RBF method take less time in training and give more accurate result then other kernel methods to also less training time compared to other kernel methods. The classification accuracy of RBF is also higher than various methods in respect of available sequences data.

  10. A novel fibrosis index comprising a non-cholesterol sterol accurately predicts HCV-related liver cirrhosis.

    PubMed

    Ydreborg, Magdalena; Lisovskaja, Vera; Lagging, Martin; Brehm Christensen, Peer; Langeland, Nina; Buhl, Mads Rauning; Pedersen, Court; Mørch, Kristine; Wejstål, Rune; Norkrans, Gunnar; Lindh, Magnus; Färkkilä, Martti; Westin, Johan

    2014-01-01

    Diagnosis of liver cirrhosis is essential in the management of chronic hepatitis C virus (HCV) infection. Liver biopsy is invasive and thus entails a risk of complications as well as a potential risk of sampling error. Therefore, non-invasive diagnostic tools are preferential. The aim of the present study was to create a model for accurate prediction of liver cirrhosis based on patient characteristics and biomarkers of liver fibrosis, including a panel of non-cholesterol sterols reflecting cholesterol synthesis and absorption and secretion. We evaluated variables with potential predictive significance for liver fibrosis in 278 patients originally included in a multicenter phase III treatment trial for chronic HCV infection. A stepwise multivariate logistic model selection was performed with liver cirrhosis, defined as Ishak fibrosis stage 5-6, as the outcome variable. A new index, referred to as Nordic Liver Index (NoLI) in the paper, was based on the model: Log-odds (predicting cirrhosis) = -12.17+ (age × 0.11) + (BMI (kg/m(2)) × 0.23) + (D7-lathosterol (μg/100 mg cholesterol)×(-0.013)) + (Platelet count (x10(9)/L) × (-0.018)) + (Prothrombin-INR × 3.69). The area under the ROC curve (AUROC) for prediction of cirrhosis was 0.91 (95% CI 0.86-0.96). The index was validated in a separate cohort of 83 patients and the AUROC for this cohort was similar (0.90; 95% CI: 0.82-0.98). In conclusion, the new index may complement other methods in diagnosing cirrhosis in patients with chronic HCV infection.

  11. Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models

    PubMed Central

    Shin, Woong-Hee; Kang, Xuejiao; Zhang, Jian; Kihara, Daisuke

    2017-01-01

    Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP). PMID:28074879

  12. Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models.

    PubMed

    Shin, Woong-Hee; Kang, Xuejiao; Zhang, Jian; Kihara, Daisuke

    2017-01-11

    Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP).

  13. Can we accurately quantify nanoparticle associated proteins when constructing high-affinity MRI molecular imaging probes?

    PubMed

    Rimkus, Gabriella; Bremer-Streck, Sibylle; Grüttner, Cordula; Kaiser, Werner Alois; Hilger, Ingrid

    2011-01-01

    Targeted magnetic resonance contrast agents (e.g. iron oxide nanoparticles) have the potential to become highly selective imaging tools. In this context, quantification of the coupled amount of protein is essential for the design of antibody- or antibody fragment-conjugated nanoparticles. Nevertheless, the presence of magnetic iron oxide nanoparticles is still an unsolved problem for this task. The aim of the present work was to clarify whether proteins can be reliably quantified directly in the presence of magnetic iron oxide nanoparticles without the use of fluorescence or radioactivity. Protein quantification via Bradford was not influenced by the presence of magnetic iron oxide nanoparticles (0-17.2 mmol Fe l(-1) ). Instead, bicinchoninic acid based assay was, indeed, distinctly affected by the presence of nanoparticle-iron in suspension (0.1-17.2 mmol Fe l(-1) ), although the influence was linear. This observation allowed for adequate mathematical corrections with known iron content of a given nanoparticle. The applicability of our approach was demonstrated by the determination of bovine serum albumin (BSA) content coupled to dextrane-coated magnetic nanoparticles, which was found with the QuantiPro Bicinchoninic acid assay to be of 1.5 ± 0.2 µg BSA per 1 mg nanoparticle. Both Bradford and bicinchoninic acid assay protein assays allow for direct quantification of proteins in the presence of iron oxide containing magnetic nanoparticles, without the need for the introduction of radioactivity or fluorescence modules. Thus in future it should be possible to make more precise estimations about the coupled protein amount in high-affinity targeted MRI probes for the identification of specific molecules in living organisms, an aspect which is lacking in corresponding works published so far. Additionally, the present protein coupling procedures can be drastically improved by our proposed protein quantification method.

  14. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  15. A Web-Accessible Protein Structure Prediction Pipeline

    DTIC Science & Technology

    2009-06-01

    almost all of these programs, the input is a protein position-specific substitution matrix ( PSSM ), in addition to the protein sequence itself. We...calculate a single PSSM for each of the input proteins using PSI-BLAST[8] and the NR database. Once the structural properties are predicted using the

  16. Predictive characterization of hypothetical proteins in Staphylococcus aureus NCTC 8325

    PubMed Central

    School, Kuana; Marklevitz, Jessica; K. Schram, William; K. Harris, Laura

    2016-01-01

    Staphylococcus aureus is one of the most common hospital acquired infections. It colonizes immunocompromised patients and with the number of antibiotic resistant strains increasing, medicine needs new treatment options. Understanding more about the proteins this organism uses would further this goal. Hypothetical proteins are sequences thought to encode a functional protein but for which little to no evidence of that function exists. About half of the genomic proteins in reference strain S. aureus NCTC 8325 are hypothetical. Since annotation of these proteins can lead to new therapeutic targets, a high demand to characterize hypothetical proteins is present. This work examines 35 hypothetical proteins from the chromosome of S. aureus NCTC 8325. Examination includes physiochemical characterization; sequence homology; structural homology; domain recognition; structure modeling; active site depiction; predicted protein-protein interactions; protein-chemical interactions; protein localization; protein stability; and protein solubility. The examination revealed some hypothetical proteins related to virulent domains and protein-protein interactions including superoxide dismutase, O-antigen, bacterial ferric iron reductase and siderophore synthesis. Yet other hypothetical proteins appear to be metabolic or transport proteins including ABC transporters, major facilitator superfamily, S-adenosylmethionine decarboxylase, and GTPases. Progress evaluating some hypothetical proteins, particularly the smaller ones, was incomplete due to limited homology and structural information in public repositories. These data characterizing hypothetical proteins will contribute to the scientific understanding of S. aureus by identifying potential drug targets and aiding in future drug discovery. PMID:28149057

  17. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  18. Interrogating noise in protein sequences from the perspective of protein-protein interactions prediction.

    PubMed

    Wang, Yongcui; Ren, Xianwen; Zhang, Chunhua; Deng, Naiyang; Zhang, Xiangsun

    2012-12-21

    The past decades witnessed extensive efforts to study the relationship among proteins. Particularly, sequence-based protein-protein interactions (PPIs) prediction is fundamentally important in speeding up the process of mapping interactomes of organisms. High-throughput experimental methodologies make many model organism's PPIs known, which allows us to apply machine learning methods to learn understandable rules from the available PPIs. Under the machine learning framework, the composition vectors are usually applied to encode proteins as real-value vectors. However, the composition vector value might be highly correlated to the distribution of amino acids, i.e., amino acids which are frequently observed in nature tend to have a large value of composition vectors. Thus formulation to estimate the noise induced by the background distribution of amino acids may be needed during representations. Here, we introduce two kinds of denoising composition vectors, which were successfully used in construction of phylogenetic trees, to eliminate the noise. When validating these two denoising composition vectors on Escherichia coli (E. coli), Saccharomyces cerevisiae (S. cerevisiae) and human PPIs datasets, surprisingly, the predictive performance is not improved, and even worse than non-denoised prediction. These results suggest that the noise in phylogenetic tree construction may be valuable information in PPIs prediction.

  19. Estimating the state of a geophysical system with sparse observations: time delay methods to achieve accurate initial states for prediction

    NASA Astrophysics Data System (ADS)

    An, Zhe; Rey, Daniel; Ye, Jingxin; Abarbanel, Henry D. I.

    2017-01-01

    The problem of forecasting the behavior of a complex dynamical system through analysis of observational time-series data becomes difficult when the system expresses chaotic behavior and the measurements are sparse, in both space and/or time. Despite the fact that this situation is quite typical across many fields, including numerical weather prediction, the issue of whether the available observations are "sufficient" for generating successful forecasts is still not well understood. An analysis by Whartenby et al. (2013) found that in the context of the nonlinear shallow water equations on a β plane, standard nudging techniques require observing approximately 70 % of the full set of state variables. Here we examine the same system using a method introduced by Rey et al. (2014a), which generalizes standard nudging methods to utilize time delayed measurements. We show that in certain circumstances, it provides a sizable reduction in the number of observations required to construct accurate estimates and high-quality predictions. In particular, we find that this estimate of 70 % can be reduced to about 33 % using time delays, and even further if Lagrangian drifter locations are also used as measurements.

  20. Accurate X-Ray Spectral Predictions: An Advanced Self-Consistent-Field Approach Inspired by Many-Body Perturbation Theory

    NASA Astrophysics Data System (ADS)

    Liang, Yufeng; Vinson, John; Pemmaraju, Sri; Drisdell, Walter S.; Shirley, Eric L.; Prendergast, David

    2017-03-01

    Constrained-occupancy delta-self-consistent-field (Δ SCF ) methods and many-body perturbation theories (MBPT) are two strategies for obtaining electronic excitations from first principles. Using the two distinct approaches, we study the O 1 s core excitations that have become increasingly important for characterizing transition-metal oxides and understanding strong electronic correlation. The Δ SCF approach, in its current single-particle form, systematically underestimates the pre-edge intensity for chosen oxides, despite its success in weakly correlated systems. By contrast, the Bethe-Salpeter equation within MBPT predicts much better line shapes. This motivates one to reexamine the many-electron dynamics of x-ray excitations. We find that the single-particle Δ SCF approach can be rectified by explicitly calculating many-electron transition amplitudes, producing x-ray spectra in excellent agreement with experiments. This study paves the way to accurately predict x-ray near-edge spectral fingerprints for physics and materials science beyond the Bethe-Salpether equation.

  1. DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool.

    PubMed

    Motion, Graham B; Howden, Andrew J M; Huitema, Edgar; Jones, Susan

    2015-12-15

    There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.

  2. A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach.

    PubMed

    Estrada, T; Zhang, B; Cicotti, P; Armen, R S; Taufer, M

    2012-07-01

    We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.

  3. Accurate retention time determination of co-eluting proteins in analytical chromatography by means of spectral data.

    PubMed

    Dismer, Florian; Hansen, Sigrid; Oelmeier, Stefan Alexander; Hubbuch, Jürgen

    2013-03-01

    Chromatography is the method of choice for the separation of proteins, at both analytical and preparative scale. Orthogonal purification strategies for industrial use can easily be implemented by combining different modes of adsorption. Nevertheless, with flexibility comes the freedom of choice and optimal conditions for consecutive steps need to be identified in a robust and reproducible fashion. One way to address this issue is the use of mathematical models that allow for an in silico process optimization. Although this has been shown to work, model parameter estimation for complex feedstocks becomes the bottleneck in process development. An integral part of parameter assessment is the accurate measurement of retention times in a series of isocratic or gradient elution experiments. As high-resolution analytics that can differentiate between proteins are often not readily available, pure protein is mandatory for parameter determination. In this work, we present an approach that has the potential to solve this problem. Based on the uniqueness of UV absorption spectra of proteins, we were able to accurately measure retention times in systems of up to four co-eluting compounds. The presented approach is calibration-free, meaning that prior knowledge of pure component absorption spectra is not required. Actually, pure protein spectra can be determined from co-eluting proteins as part of the methodology. The approach was tested for size-exclusion chromatograms of 38 mixtures of co-eluting proteins. Retention times were determined with an average error of 0.6 s (1.6% of average peak width), approximated and measured pure component spectra showed an average coefficient of correlation of 0.992.

  4. Characterization and Prediction of Protein Flexibility Based on Structural Alphabets

    PubMed Central

    Liu, Bin

    2016-01-01

    Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility. PMID:27660756

  5. A large-scale evaluation of computational protein function prediction.

    PubMed

    Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-03-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

  6. A large-scale evaluation of computational protein function prediction

    PubMed Central

    Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650

  7. Prediction of Protein Structure Using Surface Accessibility Data

    PubMed Central

    Hartlmüller, Christoph; Göbl, Christoph

    2016-01-01

    Abstract An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance‐to‐surface information encoded in the sPRE data in the chemical shift‐based CS‐Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach. PMID:27560616

  8. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

    PubMed

    Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun

    2016-04-01

    Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

  9. iStable: off-the-shelf predictor integration for predicting protein stability changes

    PubMed Central

    2013-01-01

    Background Mutation of a single amino acid residue can cause changes in a protein, which could then lead to a loss of protein function. Predicting the protein stability changes can provide several possible candidates for the novel protein designing. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users. Results We proposed an integrated predictor, iStable, with grid computing architecture constructed by using sequence information and prediction results from different element predictors. In the learning model, several machine learning methods were evaluated and adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, protein memberships in different superfamilies, and experimental conditions. Conclusions The trained and validated version of iStable provides an accurate approach for prediction of protein stability changes. iStable is freely available online at: http://predictor.nchu.edu.tw/iStable. PMID:23369171

  10. Composition-based effective chain length for prediction of protein folding rates

    NASA Astrophysics Data System (ADS)

    Chang, Le; Wang, Jun; Wang, Wei

    2010-11-01

    Folding rate prediction is a useful way to find the key factors affecting folding kinetics of proteins. Structural information is more or less required in the present prediction methods, which limits the application of these methods to various proteins. In this work, an “effective length” is defined solely based on the composition of a protein, namely, the number of specific types of amino acids in a protein. A physical theory based on a minimalist model is employed to describe the relation between the folding rates and the effective length of proteins. Based on the resultant relationship between folding rates and effective length, the optimal sets of amino acids are found through the enumeration over all possible combinations of amino acids. This optimal set achieves a high correlation (with the coefficient of 0.84) between the folding rates and the optimal effective length. The features of these amino acids are consistent with our model and landscape theory. Further comparisons between our effective length and other factors are carried out. The effective length is physically consistent with structure-based prediction methods and has the best predictability for folding rates. These results all suggest that both entropy and energetics contribute importantly to folding kinetics. The ability to accurately and efficiently predict folding rates from composition enables the analysis of the kinetics for various kinds of proteins. The underlying physics in our method may be helpful to stimulate further understanding on the effects of various amino acids in folding dynamics.

  11. How accurate are leukocyte indices and C-reactive protein for diagnosis of neonatal sepsis?

    PubMed Central

    da Silva, Orlando; Ohlsson, Arne

    1998-01-01

    Early diagnosis of neonatal sepsis is often difficult to make. Treatment on the basis of clinical suspicion and risk factors may result in overtreatment. A previous review of the usefulness of C-reactive protein and leukocyte indices concluded that these test results should be interpreted with caution. The present paper reviews and, when appropriate, revises, in light of new information, the conclusions reached in the previous systematic review of the topic. PMID:20401235

  12. Measurements of accurate x-ray scattering data of protein solutions using small stationary sample cells

    SciTech Connect

    Hong Xinguo; Hao Quan

    2009-01-15

    In this paper, we report a method of precise in situ x-ray scattering measurements on protein solutions using small stationary sample cells. Although reduction in the radiation damage induced by intense synchrotron radiation sources is indispensable for the correct interpretation of scattering data, there is still a lack of effective methods to overcome radiation-induced aggregation and extract scattering profiles free from chemical or structural damage. It is found that radiation-induced aggregation mainly begins on the surface of the sample cell and grows along the beam path; the diameter of the damaged region is comparable to the x-ray beam size. Radiation-induced aggregation can be effectively avoided by using a two-dimensional scan (2D mode), with an interval as small as 1.5 times the beam size, at low temperature (e.g., 4 deg. C). A radiation sensitive protein, bovine hemoglobin, was used to test the method. A standard deviation of less than 5% in the small angle region was observed from a series of nine spectra recorded in 2D mode, in contrast to the intensity variation seen using the conventional stationary technique, which can exceed 100%. Wide-angle x-ray scattering data were collected at a standard macromolecular diffraction station using the same data collection protocol and showed a good signal/noise ratio (better than the reported data on the same protein using a flow cell). The results indicate that this method is an effective approach for obtaining precise measurements of protein solution scattering.

  13. Measurements of accurate x-ray scattering data of protein solutions using small stationary sample cells

    NASA Astrophysics Data System (ADS)

    Hong, Xinguo; Hao, Quan

    2009-01-01

    In this paper, we report a method of precise in situ x-ray scattering measurements on protein solutions using small stationary sample cells. Although reduction in the radiation damage induced by intense synchrotron radiation sources is indispensable for the correct interpretation of scattering data, there is still a lack of effective methods to overcome radiation-induced aggregation and extract scattering profiles free from chemical or structural damage. It is found that radiation-induced aggregation mainly begins on the surface of the sample cell and grows along the beam path; the diameter of the damaged region is comparable to the x-ray beam size. Radiation-induced aggregation can be effectively avoided by using a two-dimensional scan (2D mode), with an interval as small as 1.5 times the beam size, at low temperature (e.g., 4 °C). A radiation sensitive protein, bovine hemoglobin, was used to test the method. A standard deviation of less than 5% in the small angle region was observed from a series of nine spectra recorded in 2D mode, in contrast to the intensity variation seen using the conventional stationary technique, which can exceed 100%. Wide-angle x-ray scattering data were collected at a standard macromolecular diffraction station using the same data collection protocol and showed a good signal/noise ratio (better than the reported data on the same protein using a flow cell). The results indicate that this method is an effective approach for obtaining precise measurements of protein solution scattering.

  14. HOPE: a homotopy optimization method for protein structure prediction.

    PubMed

    Dunlavy, Daniel M; O'Leary, Dianne P; Klimov, Dmitri; Thirumalai, D

    2005-12-01

    We use a homotopy optimization method, HOPE, to minimize the potential energy associated with a protein model. The method uses the minimum energy conformation of one protein as a template to predict the lowest energy structure of a query sequence. This objective is achieved by following a path of conformations determined by a homotopy between the potential energy functions for the two proteins. Ensembles of solutions are produced by perturbing conformations along the path, increasing the likelihood of predicting correct structures. Successful results are presented for pairs of homologous proteins, where HOPE is compared to a variant of Newton's method and to simulated annealing.

  15. Protein Structure Prediction Using String Kernels

    DTIC Science & Technology

    2006-03-03

    Prediction using String Kernels 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER...consists of 4352 sequences from SCOP version 1.53 extracted from the Astral database, grouped into families and superfamilies. The dataset is processed

  16. Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction.

    PubMed

    Cao, Junzhe; Liu, Wenqi; He, Jianjun; Gu, Hong

    2013-07-01

    A new method is proposed to identify whether a query protein is singleplex or multiplex for improving the quality of protein subcellular localization prediction. Based on the transductive learning technique, this approach utilizes the information from the both query proteins and known proteins to estimate the subcellular location number of every query protein so that the singleplex and multiplex proteins can be recognized and distinguished. Each query protein is then dealt with by a targeted single-label or multi-label predictor to achieve a high-accuracy prediction result. We assess the performance of the proposed approach by applying it to three groups of protein sequences datasets. Simulation experiments show that the proposed approach can effectively identify the singleplex and multiplex proteins. Through a comparison, the reliably of this method for enhancing the power of predicting protein subcellular localization can also be verified.

  17. PPIevo: protein-protein interaction prediction from PSSM based evolutionary information.

    PubMed

    Zahiri, Javad; Yaghoubi, Omid; Mohammad-Noori, Morteza; Ebrahimpour, Reza; Masoudi-Nejad, Ali

    2013-10-01

    Protein-protein interactions regulate a variety of cellular processes. There is a great need for computational methods as a complement to experimental methods with which to predict protein interactions due to the existence of many limitations involved in experimental techniques. Here, we introduce a novel evolutionary based feature extraction algorithm for protein-protein interaction (PPI) prediction. The algorithm is called PPIevo and extracts the evolutionary feature from Position-Specific Scoring Matrix (PSSM) of protein with known sequence. The algorithm does not depend on the protein annotations, and the features are based on the evolutionary history of the proteins. This enables the algorithm to have more power for predicting protein-protein interaction than many sequence based algorithms. Results on the HPRD database show better performance and robustness of the proposed method. They also reveal that the negative dataset selection could lead to an acute performance overestimation which is the principal drawback of the available methods.

  18. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  19. A Software Pipeline for Protein Structure Prediction

    DTIC Science & Technology

    2006-11-01

    programs, such as GenTHREADER (Jones 1999), FUGUE (Shi, Blundell et al. 2001), and 3D- PSSM (Kelley, MacCallum et al. 2000), use hybrid approaches...annotation using structural profiles in the program 3D- PSSM , J Mol Biol, 299, 499- 520. Kim, D., D. Xu, et al., 2003: PROSPECT II: protein structure

  20. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance

    PubMed Central

    Hong, Ha; Solomon, Ethan A.; DiCarlo, James J.

    2015-01-01

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior. PMID:26424887

  1. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance.

    PubMed

    Majaj, Najib J; Hong, Ha; Solomon, Ethan A; DiCarlo, James J

    2015-09-30

    database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior.

  2. Protein localization prediction using random walks on graphs

    PubMed Central

    2013-01-01

    Background Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. Results We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. Conclusions This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers. PMID:23815126

  3. Structure-based Methods for Computational Protein Functional Site Prediction

    PubMed Central

    Dukka, B KC

    2013-01-01

    Due to the advent of high throughput sequencing techniques and structural genomic projects, the number of gene and protein sequences has been ever increasing. Computational methods to annotate these genes and proteins are even more indispensable. Proteins are important macromolecules and study of the function of proteins is an important problem in structural bioinformatics. This paper discusses a number of methods to predict protein functional site especially focusing on protein ligand binding site prediction. Initially, a short overview is presented on recent advances in methods for selection of homologous sequences. Furthermore, a few recent structural based approaches and sequence-and-structure based approaches for protein functional sites are discussed in details. PMID:24688745

  4. Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling

    PubMed Central

    Lees, Jon G.; Reid, Adam J.; Yeats, Corin; Clegg, Andrew B.; Sanchez-Jimenez, Francisca; Orengo, Christine

    2010-01-01

    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or “dark matter” of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case

  5. Accurate ab initio predictions of ionization energies and heats of formation for the 2-propyl, phenyl, and benzyl radicals

    NASA Astrophysics Data System (ADS)

    Lau, K.-C.; Ng, C. Y.

    2006-01-01

    The ionization energies (IEs) for the 2-propyl (2-C3H7), phenyl (C6H5), and benzyl (C6H5CH2) radicals have been calculated by the wave-function-based ab initio CCSD(T)/CBS approach, which involves the approximation to the complete basis set (CBS) limit at the coupled cluster level with single and double excitations plus quasiperturbative triple excitation [CCSD(T)]. The zero-point vibrational energy correction, the core-valence electronic correction, and the scalar relativistic effect correction have been also made in these calculations. Although a precise IE value for the 2-C3H7 radical has not been directly determined before due to the poor Franck-Condon factor for the photoionization transition at the ionization threshold, the experimental value deduced indirectly using other known energetic data is found to be in good accord with the present CCSD(T)/CBS prediction. The comparison between the predicted value through the focal-point analysis and the highly precise experimental value for the IE(C6H5CH2) determined in the previous pulsed field ionization photoelectron (PFI-PE) study shows that the CCSD(T)/CBS method is capable of providing an accurate IE prediction for C6H5CH2, achieving an error limit of 35 meV. The benchmarking of the CCSD(T)/CBS IE(C6H5CH2) prediction suggests that the CCSD(T)/CBS IE(C6H5) prediction obtained here has a similar accuracy of 35 meV. Taking into account this error limit for the CCSD(T)/CBS prediction and the experimental uncertainty, the CCSD(T)/CBS IE(C6H5) value is also consistent with the IE(C6H5) reported in the previous HeI photoelectron measurement. Furthermore, the present study provides support for the conclusion that the CCSD(T)/CBS approach with high-level energy corrections can be used to provide reliable IE predictions for C3-C7 hydrocarbon radicals with an uncertainty of +/-35 meV. Employing the atomization scheme, we have also computed the 0 K (298 K) heats of formation in kJ/mol at the CCSD(T)/CBS level for 2-C3H7

  6. Urinary Excretion of Liver Type Fatty Acid Binding Protein Accurately Reflects the Degree of Tubulointerstitial Damage

    PubMed Central

    Yokoyama, Takeshi; Kamijo-Ikemori, Atsuko; Sugaya, Takeshi; Hoshino, Seiko; Yasuda, Takashi; Kimura, Kenjiro

    2009-01-01

    To investigate the relationship between liver-type fatty acid-binding protein (L-FABP), a biomarker of chronic kidney disease, in the kidney and the degree of tubulointerstitial damage, folic acid (FA)-induced nephropathy was studied in a mouse model system. As renal L-FABP is not expressed in wild-type mice, human L-FABP (hL-FABP) transgenic mice were used in this study. hL-FABP is expressed in the renal proximal tubules of the transgenic mice that were injected intraperitoneally with FA in NaHCO3 (the FA group) or only NaHCO3 (the control group) and oral saline solution daily during the experimental period. The FA group developed severe tubulointerstitial damage with the infiltration of macrophages and the deposition of type I collagen on days 3 and 7 and recovered to the control level on day 14. The gene and protein expression levels of hL-FABP in the kidney were significantly enhanced on days 3 and 7. Urinary hL-FABP in the FA group was elevated on days 3 and 7 and decreased to the control level on day 14. The protein expression levels of hL-FABP in both the kidney and urine significantly correlated with the degree of tubulointerstitial damage, the infiltration of macrophages, and the deposition of type I collagen. In conclusion, renal expression and urinary excretion of hL-FABP significantly reflected the severity of tubulointerstitial damage in FA-induced nephropathy. PMID:19435794

  7. Accurate determination of the diffusion coefficient of proteins by Fourier analysis with whole column imaging detection.

    PubMed

    Zarabadi, Atefeh S; Pawliszyn, Janusz

    2015-02-17

    Analysis in the frequency domain is considered a powerful tool to elicit precise information from spectroscopic signals. In this study, the Fourier transformation technique is employed to determine the diffusion coefficient (D) of a number of proteins in the frequency domain. Analytical approaches are investigated for determination of D from both experimental and data treatment viewpoints. The diffusion process is modeled to calculate diffusion coefficients based on the Fourier transformation solution to Fick's law equation, and its results are compared to time domain results. The simulations characterize optimum spatial and temporal conditions and demonstrate the noise tolerance of the method. The proposed model is validated by its application for the electropherograms from the diffusion path of a set of proteins. Real-time dynamic scanning is conducted to monitor dispersion by employing whole column imaging detection technology in combination with capillary isoelectric focusing (CIEF) and the imaging plug flow (iPF) experiment. These experimental techniques provide different peak shapes, which are utilized to demonstrate the Fourier transformation ability in extracting diffusion coefficients out of irregular shape signals. Experimental results confirmed that the Fourier transformation procedure substantially enhanced the accuracy of the determined values compared to those obtained in the time domain.

  8. Protein Structure and Function Prediction Using I-TASSER.

    PubMed

    Yang, Jianyi; Zhang, Yang

    2015-12-17

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.

  9. Using Bacteria to Determine Protein Kinase Specificity and Predict Target Substrates

    PubMed Central

    Lubner, Joshua M.; Church, George M.; Husson, Robert N.; Schwartz, Daniel

    2012-01-01

    The identification of protein kinase targets remains a significant bottleneck for our understanding of signal transduction in normal and diseased cellular states. Kinases recognize their substrates in part through sequence motifs on substrate proteins, which, to date, have most effectively been elucidated using combinatorial peptide library approaches. Here, we present and demonstrate the ProPeL method for easy and accurate discovery of kinase specificity motifs through the use of native bacterial proteomes that serve as in vivo libraries for thousands of simultaneous phosphorylation reactions. Using recombinant kinases expressed in E. coli followed by mass spectrometry, the approach accurately recapitulated the well-established motif preferences of human basophilic (Protein Kinase A) and acidophilic (Casein Kinase II) kinases. These motifs, derived for PKA and CK II using only bacterial sequence data, were then further validated by utilizing them in conjunction with the scan-x software program to computationally predict known human phosphorylation sites with high confidence. PMID:23300758

  10. A Support Vector Machine model for the prediction of proteotypic peptides for accurate mass and time proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Cannon, William R.; Oehmen, Christopher S.; Shah, Anuj R.; Gurumoorthi, Vidhya; Lipton, Mary S.; Waters, Katrina M.

    2008-07-01

    Motivation: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares these profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). Results: We present a Support Vector Machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity, and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of ~0.8 with a standard deviation of less than 0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. Availability: http://omics.pnl.gov/software/STEPP.php

  11. DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins.

    PubMed

    Tan, Kuan Pern; Varadarajan, Raghavan; Madhusudhan, M S

    2011-07-01

    Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight <1.5  KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of ∼0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.

  12. Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems

    SciTech Connect

    Samudrala, Ram; Heffron, Fred; McDermott, Jason E.

    2009-04-24

    The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates, effector proteins, are not. We have used a machine learning approach to identify new secreted effectors. The method integrates evolutionary measures, such as the pattern of homologs in a range of other organisms, and sequence-based features, such as G+C content, amino acid composition and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from Salmonella typhimurium and validated on a corresponding set of effectors from Pseudomonas syringae, after eliminating effectors with detectable sequence similarity. The method was able to identify all of the known effectors in P. syringae with a specificity of 84% and sensitivity of 82%. The reciprocal validation, training on P. syringae and validating on S. typhimurium, gave similar results with a specificity of 86% when the sensitivity level was 87%. These results show that type III effectors in disparate organisms share common features. We found that maximal performance is attained by including an N-terminal sequence of only 30 residues, which agrees with previous studies indicating that this region contains the secretion signal. We then used the method to define the most important residues in this putative secretion signal. Finally, we present novel predictions of secreted effectors in S. typhimurium, some of which have been experimentally validated, and apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis. This approach is a novel and effective way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.

  13. HMMpTM: improving transmembrane protein topology prediction using phosphorylation and glycosylation site prediction.

    PubMed

    Tsaousis, Georgios N; Bagos, Pantelis G; Hamodrakas, Stavros J

    2014-02-01

    During the last two decades a large number of computational methods have been developed for predicting transmembrane protein topology. Current predictors rely on topogenic signals in the protein sequence, such as the distribution of positively charged residues in extra-membrane loops and the existence of N-terminal signals. However, phosphorylation and glycosylation are post-translational modifications (PTMs) that occur in a compartment-specific manner and therefore the presence of a phosphorylation or glycosylation site in a transmembrane protein provides topological information. We examine the combination of phosphorylation and glycosylation site prediction with transmembrane protein topology prediction. We report the development of a Hidden Markov Model based method, capable of predicting the topology of transmembrane proteins and the existence of kinase specific phosphorylation and N/O-linked glycosylation sites along the protein sequence. Our method integrates a novel feature in transmembrane protein topology prediction, which results in improved performance for topology prediction and reliable prediction of phosphorylation and glycosylation sites. The method is freely available at http://bioinformatics.biol.uoa.gr/HMMpTM.

  14. Dual X-ray absorptiometry accurately predicts carcass composition from live sheep and chemical composition of live and dead sheep.

    PubMed

    Pearce, K L; Ferguson, M; Gardner, G; Smith, N; Greef, J; Pethick, D W

    2009-01-01

    Fifty merino wethers (liveweight range from 44 to 81kg, average of 58.6kg) were lot fed for 42d and scanned through a dual X-ray absorptiometry (DXA) as both a live animal and whole carcass (carcass weight range from 15 to 32kg, average of 22.9kg) producing measures of total tissue, lean, fat and bone content. The carcasses were subsequently boned out into saleable cuts and the weights and yield of boned out muscle, fat and bone recorded. The relationship between chemical lean (protein+water) was highly correlated with DXA carcass lean (r(2)=0.90, RSD=0.674kg) and moderately with DXA live lean (r(2)=0.72, RSD=1.05kg). The relationship between the chemical fat was moderately correlated with DXA carcass fat (r(2)=0.86, RSD=0.42kg) and DXA live fat (r(2)=0.70, RSD=0.71kg). DXA carcass and live animal bone was not well correlated with chemical ash (both r(2)=0.38, RSD=0.3). DXA carcass lean was moderately well predicted from DXA live lean with the inclusion of bodyweight in the regression (r(2)=0.82, RSD=0.87kg). DXA carcass fat was well predicted from DXA live fat (r(2)=0.86, RSD=0.54kg). DXA carcass lean and DXA carcass fat with the inclusion of carcass weight in the regression significantly predicted boned out muscle (r(2)=0.97, RSD=0.32kg) and fat weight, respectively (r(2)=0.92, RSD=0.34kg). The use of DXA live lean and DXA live fat with the inclusion of bodyweight to predict boned out muscle (r(2)=0.83, RSD=0.75kg) and fat (r(2)=0.86, RSD=0.46kg) weight, respectively, was moderate. The use of DXA carcass and live lean and fat to predict boned out muscle and fat yield was not correlated as weight. The future for the DXA will exist in the determination of body composition in live animals and carcasses in research experiments but there is potential for the DXA to be used as an online carcass grading system.

  15. Computational Prediction of RNA-Binding Proteins and Binding Sites

    PubMed Central

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions. PMID:26540053

  16. Functional classification of protein 3D structures from predicted local interaction sites.

    PubMed

    Parasuram, Ramya; Lee, Joslynn S; Yin, Pengcheng; Somarowthu, Srinivas; Ondrechen, Mary Jo

    2010-12-01

    A new approach to the functional classification of protein 3D structures is described with application to some examples from structural genomics. This approach is based on functional site prediction with THEMATICS and POOL. THEMATICS employs calculated electrostatic potentials of the query structure. POOL is a machine learning method that utilizes THEMATICS features and has been shown to predict accurate, precise, highly localized interaction sites. Extension to the functional classification of structural genomics proteins is now described. Predicted functionally important residues are structurally aligned with those of proteins with previously characterized biochemical functions. A 3D structure match at the predicted local functional site then serves as a more reliable predictor of biochemical function than an overall structure match. Annotation is confirmed for a structural genomics protein with the ribulose phosphate binding barrel (RPBB) fold. A putative glucoamylase from Bacteroides fragilis (PDB ID 3eu8) is shown to be in fact probably not a glucoamylase. Finally a structural genomics protein from Streptomyces coelicolor annotated as an enoyl-CoA hydratase (PDB ID 3g64) is shown to be misannotated. Its predicted active site does not match the well-characterized enoyl-CoA hydratases of similar structure but rather bears closer resemblance to those of a dehalogenase with similar fold.

  17. Engineering Genes for Predictable Protein Expression

    PubMed Central

    Gustafsson, Claes; Minshull, Jeremy; Govindarajan, Sridhar; Ness, Jon; Villalobos, Alan; Welch, Mark

    2013-01-01

    The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering. PMID:22425659

  18. Engineering genes for predictable protein expression.

    PubMed

    Gustafsson, Claes; Minshull, Jeremy; Govindarajan, Sridhar; Ness, Jon; Villalobos, Alan; Welch, Mark

    2012-05-01

    The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.

  19. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  20. Using gross energy improves metabolizable energy predictive equations for pet foods whereas undigested protein and fiber content predict stool quality.

    PubMed

    Hall, Jean A; Melendez, Lynda D; Jewell, Dennis E

    2013-01-01

    Because animal studies are labor intensive, predictive equations are used extensively for calculating metabolizable energy (ME) concentrations of dog and cat pet foods. The objective of this retrospective review of digestibility studies, which were conducted over a 7-year period and based upon Association of American Feed Control Officials (AAFCO) feeding protocols, was to compare the accuracy and precision of equations developed from these animal feeding studies to commonly used predictive equations. Feeding studies in dogs and cats (331 and 227 studies, respectively) showed that equations using modified Atwater factors accurately predict ME concentrations in dog and cat pet foods (r²= 0.97 and 0.98, respectively). The National Research Council (NRC) equations also accurately predicted ME concentrations in pet foods (r² = 0.97 for dog and cat foods). For dogs, these equations resulted in an average estimate of ME within 0.16% and 2.24% of the actual ME measured (equations using modified Atwater factors and NRC equations, respectively); for cats these equations resulted in an average estimate of ME within 1.57% and 1.80% of the actual ME measured. However, better predictions of dietary ME in dog and cat pet foods were achieved using equations based on analysis of gross energy (GE) and new factors for moisture, protein, fat and fiber. When this was done there was less than 0.01% difference between the measured ME and the average predicted ME (r² = 0.99 and 1.00 in dogs and cats, respectively) whereas the absolute value of the difference between measured and predicted was reduced by approximately 50% in dogs and 60% in cats. Stool quality, which was measured by stool score, was influenced positively when dietary protein digestibility was high and fiber digestibility was low. In conclusion, using GE improves predictive equations for ME content of dog and cat pet foods. Nondigestible protein and fiber content of diets predicts stool quality.

  1. Accurate and reproducible detection of proteins in water using an extended-gate type organic transistor biosensor

    NASA Astrophysics Data System (ADS)

    Minamiki, Tsukuru; Minami, Tsuyoshi; Kurita, Ryoji; Niwa, Osamu; Wakida, Shin-ichi; Fukuda, Kenjiro; Kumaki, Daisuke; Tokito, Shizuo

    2014-06-01

    In this Letter, we describe an accurate antibody detection method using a fabricated extended-gate type organic field-effect-transistor (OFET), which can be operated at below 3 V. The protein-sensing portion of the designed device is the gate electrode functionalized with streptavidin. Streptavidin possesses high molecular recognition ability for biotin, which specifically allows for the detection of biotinylated proteins. Here, we attempted to detect biotinylated immunoglobulin G (IgG) and observed a shift of threshold voltage of the OFET upon the addition of the antibody in an aqueous solution with a competing bovine serum albumin interferent. The detection limit for the biotinylated IgG was 8 nM, which indicates the potential utility of the designed device in healthcare applications.

  2. A time accurate prediction of the viscous flow in a turbine stage including a rotor in motion

    NASA Astrophysics Data System (ADS)

    Shavalikul, Akamol

    accurate flow characteristics in the NGV domain and the rotor domain with less computational time and computer memory requirements. In contrast, the time accurate flow simulation can predict all unsteady flow characteristics occurring in the turbine stage, but with high computational resource requirements. (Abstract shortened by UMI.)

  3. Assessing Predicted Contacts for Building Protein Three-Dimensional Models.

    PubMed

    Adhikari, Badri; Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin

    2017-01-01

    Recent successes of contact-guided protein structure prediction methods have revived interest in solving the long-standing problem of ab initio protein structure prediction. With homology modeling failing for many protein sequences that do not have templates, contact-guided structure prediction has shown promise, and consequently, contact prediction has gained a lot of interest recently. Although a few dozen contact prediction tools are already currently available as web servers and downloadables, not enough research has been done towards using existing measures like precision and recall to evaluate these contacts with the goal of building three-dimensional models. Moreover, when we do not have a native structure for a set of predicted contacts, the only analysis we can perform is a simple contact map visualization of the predicted contacts. A wider and more rigorous assessment of the predicted contacts is needed, in order to build tertiary structure models. This chapter discusses instructions and protocols for using tools and applying techniques in order to assess predicted contacts for building three-dimensional models.

  4. De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts

    PubMed Central

    Kosciolek, Tomasz; Jones, David T.

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm – FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step. PMID:24637808

  5. De novo structure prediction of globular proteins aided by sequence variation-derived contacts.

    PubMed

    Kosciolek, Tomasz; Jones, David T

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

  6. Protein function prediction using guilty by association from interaction networks.

    PubMed

    Piovesan, Damiano; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E

    2015-12-01

    Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein-protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein-protein interaction networks is also provided.

  7. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method

    PubMed Central

    Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao

    2015-01-01

    Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction. PMID:26587542

  8. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  9. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  10. Effect of the quality of the interaction data on predicting protein function from protein-protein interactions.

    PubMed

    Ni, Qing-Shan; Wang, Zheng-Zhi; Li, Gang-Guo; Wang, Guang-Yun; Zhao, Ying-Jie

    2009-03-01

    Protein function prediction is an important issue in the post-genomic era. When protein function is deduced from protein interaction data, the traditional methods treat each interaction sample equally, where the qualities of the interaction samples are seldom taken into account. In this paper, we investigate the effect of the quality of protein-protein interaction data on predicting protein function. Moreover, two improved methods, weight neighbour counting method (WNC) and weight chi-square method (WCHI), are proposed by considering the quality of interaction samples with the neighbour counting method (NC) and chi-square method (CHI). Experimental results have shown that the qualities of interaction samples affect the performances of protein function prediction methods seriously. It is also demonstrated that WNC and WCHI methods outperform NC and CHI methods in protein function prediction when example weights are chosen properly.

  11. Predicting multisite protein subcellular locations: progress and challenges.

    PubMed

    Du, Pufeng; Xu, Chao

    2013-06-01

    In the last two decades, predicting protein subcellular locations has become a hot topic in bioinformatics. A number of algorithms and online services have been developed to computationally assign a subcellular location to a given protein sequence. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, multisite prediction has only been considered in a handful of recent studies, in which there are several common challenges. In this special report, the authors discuss what these challenges are, why these challenges are important and how the existing studies gave their solutions. Finally, a vision of the future of predicting multisite protein subcellular locations is given.

  12. A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons

    PubMed Central

    Tian, Tian; Salis, Howard M.

    2015-01-01

    Natural and engineered genetic systems require the coordinated expression of proteins. In bacteria, translational coupling provides a genetically encoded mechanism to control expression level ratios within multi-cistronic operons. We have developed a sequence-to-function biophysical model of translational coupling to predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios. To quantitatively measure ribosome re-initiation rates, we designed and characterized 22 bi-cistronic operon variants with systematically modified intergenic distances and upstream translation rates. We then derived a thermodynamic free energy model to calculate de novo initiation rates as a result of ribosome-assisted unfolding of intergenic RNA structures. The complete biophysical model has only five free parameters, but was able to accurately predict downstream translation rates for 120 synthetic bi-cistronic and tri-cistronic operons with rationally designed intergenic regions and systematically increased upstream translation rates. The biophysical model also accurately predicted the translation rates of the nine protein atp operon, compared to ribosome profiling measurements. Altogether, the biophysical model quantitatively predicts how translational coupling controls protein expression levels in synthetic and natural bacterial operons, providing a deeper understanding of an important post-transcriptional regulatory mechanism and offering the ability to rationally engineer operons with desired behaviors. PMID:26117546

  13. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    PubMed

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  14. Truly Absorbed Microbial Protein Synthesis, Rumen Bypass Protein, Endogenous Protein, and Total Metabolizable Protein from Starchy and Protein-Rich Raw Materials: Model Comparison and Predictions.

    PubMed

    Parand, Ehsan; Vakili, Alireza; Mesgaran, Mohsen Danesh; van Duinkerken, Gert; Yu, Peiqiang

    2015-07-29

    This study was carried out to measure truly absorbed microbial protein synthesis, rumen bypass protein, and endogenous protein loss, as well as total metabolizable protein, from starchy and protein-rich raw feed materials with model comparisons. Predictions by the DVE2010 system as a more mechanistic model were compared with those of two other models, DVE1994 and NRC-2001, that are frequently used in common international feeding practice. DVE1994 predictions for intestinally digestible rumen undegradable protein (ARUP) for starchy concentrates were higher (27 vs 18 g/kg DM, p < 0.05, SEM = 1.2) than predictions by the NRC-2001, whereas there was no difference in predictions for ARUP from protein concentrates among the three models. DVE2010 and NRC-2001 had highest estimations of intestinally digestible microbial protein for starchy (92 g/kg DM in DVE2010 vs 46 g/kg DM in NRC-2001 and 67 g/kg DM in DVE1994, p < 0.05 SEM = 4) and protein concentrates (69 g/kg DM in NRC-2001 vs 31 g/kg DM in DVE1994 and 49 g/kg DM in DVE2010, p < 0.05 SEM = 4), respectively. Potential protein supplies predicted by tested models from starchy and protein concentrates are widely different, and comparable direct measurements are needed to evaluate the actual ability of different models to predict the potential protein supply to dairy cows from different feedstuffs.

  15. Accurate Quantification of Cardiovascular Biomarkers in Serum Using Protein Standard Absolute Quantification (PSAQ™) and Selected Reaction Monitoring*

    PubMed Central

    Huillet, Céline; Adrait, Annie; Lebert, Dorothée; Picard, Guillaume; Trauchessec, Mathieu; Louwagie, Mathilde; Dupuis, Alain; Hittinger, Luc; Ghaleh, Bijan; Le Corvoisier, Philippe; Jaquinod, Michel; Garin, Jérôme; Bruley, Christophe; Brun, Virginie

    2012-01-01

    Development of new biomarkers needs to be significantly accelerated to improve diagnostic, prognostic, and toxicity monitoring as well as therapeutic follow-up. Biomarker evaluation is the main bottleneck in this development process. Selected Reaction Monitoring (SRM) combined with stable isotope dilution has emerged as a promising option to speed this step, particularly because of its multiplexing capacities. However, analytical variabilities because of upstream sample handling or incomplete trypsin digestion still need to be resolved. In 2007, we developed the PSAQ™ method (Protein Standard Absolute Quantification), which uses full-length isotope-labeled protein standards to quantify target proteins. In the present study we used clinically validated cardiovascular biomarkers (LDH-B, CKMB, myoglobin, and troponin I) to demonstrate that the combination of PSAQ and SRM (PSAQ-SRM) allows highly accurate biomarker quantification in serum samples. A multiplex PSAQ-SRM assay was used to quantify these biomarkers in clinical samples from myocardial infarction patients. Good correlation between PSAQ-SRM and ELISA assay results was found and demonstrated the consistency between these analytical approaches. Thus, PSAQ-SRM has the capacity to improve both accuracy and reproducibility in protein analysis. This will be a major contribution to efficient biomarker development strategies. PMID:22080464

  16. Machine Learning Approaches for Predicting Protein Complex Similarity.

    PubMed

    Farhoodi, Roshanak; Akbal-Delibas, Bahar; Haspel, Nurit

    2017-01-01

    Discriminating native-like structures from false positives with high accuracy is one of the biggest challenges in protein-protein docking. While there is an agreement on the existence of a relationship between various favorable intermolecular interactions (e.g., Van der Waals, electrostatic, and desolvation forces) and the similarity of a conformation to its native structure, the precise nature of this relationship is not known. Existing protein-protein docking methods typically formulate this relationship as a weighted sum of selected terms and calibrate their weights by using a training set to evaluate and rank candidate complexes. Despite improvements in the predictive power of recent docking methods, producing a large number of false positives by even state-of-the-art methods often leads to failure in predicting the correct binding of many complexes. With the aid of machine learning methods, we tested several approaches that not only rank candidate structures relative to each other but also predict how similar each candidate is to the native conformation. We trained a two-layer neural network, a multilayer neural network, and a network of Restricted Boltzmann Machines against extensive data sets of unbound complexes generated by RosettaDock and PyDock. We validated these methods with a set of refinement candidate structures. We were able to predict the root mean squared deviations (RMSDs) of protein complexes with a very small, often less than 1.5 Å, error margin when trained with structures that have RMSD values of up to 7 Å. In our most recent experiments with the protein samples having RMSD values up to 27 Å, the average prediction error was still relatively small, attesting to the potential of our approach in predicting the correct binding of protein-protein complexes.

  17. MS-kNN: protein function prediction by integrating multiple data sources

    PubMed Central

    2013-01-01

    Background Protein function determination is a key challenge in the post-genomic era. Experimental determination of protein functions is accurate, but time-consuming and resource-intensive. A cost-effective alternative is to use the known information about sequence, structure, and functional properties of genes and proteins to predict functions using statistical methods. In this paper, we describe the Multi-Source k-Nearest Neighbor (MS-kNN) algorithm for function prediction, which finds k-nearest neighbors of a query protein based on different types of similarity measures and predicts its function by weighted averaging of its neighbors' functions. Specifically, we used 3 data sources to calculate the similarity scores: sequence similarity, protein-protein interactions, and gene expressions. Results We report the results in the context of 2011 Critical Assessment of Function Annotation (CAFA). Prior to CAFA submission deadline, we evaluated our algorithm on 1,302 human test proteins that were represented in all 3 data sources. Using only the sequence similarity information, MS-kNN had term-based Area Under the Curve (AUC) accuracy of Gene Ontology (GO) molecular function predictions of 0.728 when 7,412 human training proteins were used, and 0.819 when 35,622 training proteins from multiple eukaryotic and prokaryotic organisms were used. By aggregating predictions from all three sources, the AUC was further improved to 0.848. Similar result was observed on prediction of GO biological processes. Testing on 595 proteins that were annotated after the CAFA submission deadline showed that overall MS-kNN accuracy was higher than that of baseline algorithms Gotcha and BLAST, which were based solely on sequence similarity information. Since only 10 of the 595 proteins were represented by all 3 data sources, and 66 by two data sources, the difference between 3-source and one-source MS-kNN was rather small. Conclusions Based on our results, we have several useful insights: (1

  18. Roles for text mining in protein function prediction.

    PubMed

    Verspoor, Karin M

    2014-01-01

    The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.

  19. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    PubMed Central

    Sonah, Humira; Deshmukh, Rupesh K.; Bélanger, Richard R.

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant–pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  20. Feature Fusion Based SVM Classifier for Protein Subcellular Localization Prediction.

    PubMed

    Rahman, Julia; Mondal, Md Nazrul Islam; Islam, Md Khaled Ben; Hasan, Md Al Mehedi

    2016-12-18

    For the importance of protein subcellular localization in different branches of life science and drug discovery, researchers have focused their attentions on protein subcellular localization prediction. Effective representation of features from protein sequences plays a most vital role in protein subcellular localization prediction specially in case of machine learning techniques. Single feature representation-like pseudo amino acid composition (PseAAC), physiochemical property models (PPM), and amino acid index distribution (AAID) contains insufficient information from protein sequences. To deal with such problems, we have proposed two feature fusion representations, AAIDPAAC and PPMPAAC, to work with Support Vector Machine classifiers, which fused PseAAC with PPM and AAID accordingly. We have evaluated the performance for both single and fused feature representation of a Gram-negative bacterial dataset. We have got at least 3% more actual accuracy by AAIDPAAC and 2% more locative accuracy by PPMPAAC than single feature representation.

  1. Predicting Next Year's Resources--Short-Term Enrollment Forecasting for Accurate Budget Planning. AIR Forum Paper 1978.

    ERIC Educational Resources Information Center

    Salley, Charles D.

    Accurate enrollment forecasts are a prerequisite for reliable budget projections. This is because tuition payments make up a significant portion of a university's revenue, and anticipated revenue is the immediate constraint on current operating expenditures. Accurate forecasts are even more critical to revenue projections when a university's…

  2. Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.

    PubMed

    Davey, James A; Chica, Roberto A

    2014-05-01

    Multistate computational protein design (MSD) with backbone ensembles approximating conformational flexibility can predict higher quality sequences than single-state design with a single fixed backbone. However, it is currently unclear what characteristics of backbone ensembles are required for the accurate prediction of protein sequence stability. In this study, we aimed to improve the accuracy of protein stability predictions made with MSD by using a variety of backbone ensembles to recapitulate the experimentally measured stability of 85 Streptococcal protein G domain β1 sequences. Ensembles tested here include an NMR ensemble as well as those generated by molecular dynamics (MD) simulations, by Backrub motions, and by PertMin, a new method that we developed involving the perturbation of atomic coordinates followed by energy minimization. MSD with the PertMin ensembles resulted in the most accurate predictions by providing the highest number of stable sequences in the top 25, and by correctly binning sequences as stable or unstable with the highest success rate (≈90%) and the lowest number of false positives. The performance of PertMin ensembles is due to the fact that their members closely resemble the input crystal structure and have low potential energy. Conversely, the NMR ensemble as well as those generated by MD simulations at 500 or 1000 K reduced prediction accuracy due to their low structural similarity to the crystal structure. The ensembles tested herein thus represent on- or off-target models of the native protein fold and could be used in future studies to design for desired properties other than stability.

  3. Protein function prediction by massive integration of evolutionary analyses and multiple data sources

    PubMed Central

    2013-01-01

    Background Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt. Methods Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure. Results We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the naïve predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments. Conclusions Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of

  4. Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

    PubMed Central

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip

    2015-01-01

    Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  5. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM.

    PubMed

    Zhang, Shengli; Ye, Feng; Yuan, Xiguo

    2012-01-01

    The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.

  6. Predicting Protein Hinge Motions and Allostery Using Rigidity Theory

    NASA Astrophysics Data System (ADS)

    Sljoka, Adnan; Bezginov, Alexandr

    2011-11-01

    Understanding how a 3D structure of a protein functions depends on predicting which regions are rigid, and which are flexible. One recent approach models molecules as a structure of fixed units (atoms with their bond angles as rigid units, bonds as hinges) plus biochemical constraints coming from the local geometry. This generates a `molecular graph' in the theory of combinatorial rigidity. The 6|V|-6 counting condition for 3-dimensional body-hinge structures (modulo molecular theorem), and a fast `pebble game' algorithm which tracks this count in the multigraph, have led to the development of the program FIRST, for rapid predictions of the flexibility of proteins. In this study we develop a novel protein hinge prediction algorithm via our extension of the pebble game algorithm (relevant regions detection algorithm). We have tested our hinge prediction algorithm on several proteins chosen from the dataset of manually annotated hinges available on the MOLMOV server. Many of our predictions are in very good agreement with this data set. Our algorithms can also predict `allosteric' interactions in proteins—where binding on one site of a molecule changes the shape or binding at a distance `active site' of the molecule. We also give some promising results which support the sliding piston-like movement of helices with respect to one another as a plausible mechanism by which GCPR receptors propagate conformational changes across membranes.

  7. Exploiting protein flexibility to predict the location of allosteric sites

    PubMed Central

    2012-01-01

    Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity), by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing that this simple coarse

  8. Predicting protein subcellular location using digital signal processing.

    PubMed

    Pan, Yu-Xi; Li, Da-Wei; Duan, Yun; Zhang, Zhi-Zhou; Xu, Ming-Qing; Feng, Guo-Yin; He, Lin

    2005-02-01

    The biological functions of a protein are closely related to its attributes in a cell. With the rapid accumulation of newly found protein sequence data in databanks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will expedite the functional determination of newly found proteins and the process of prioritizing genes and proteins identified by genomic efforts as potential molecular targets for drug design. The traditional algorithms for predicting these attributes were based solely on amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns in protein sequences is extremely large, posing a formidable difficulty for realizing this goal. To deal with such difficulty, a well-developed tool in digital signal processing named digital Fourier transform (DFT) [1] was introduced. After being translated to a digital signal according to the hydrophobicity of each amino acid, a protein was analyzed by DFT within the frequency domain. A set of frequency spectrum parameters, thus obtained, were regarded as the factors to represent the sequence order effect. A significant improvement in prediction quality was observed by incorporating the frequency spectrum parameters with the conventional amino acid composition. One of the crucial merits of this approach is that many existing tools in mathematics and engineering can be easily applied in the predicting process. It is anticipated that digital signal processing may serve as a useful vehicle for many other protein science areas.

  9. The prediction and characterization of YshA, an unknown outer membrane protein from Salmonella typhimurium

    PubMed Central

    Freeman, Thomas C.; Landry, Samuel J.; Wimley, William C.

    2010-01-01

    We have developed an effective pathway for the prediction and characterization of novel transmembrane β-barrel proteins. The Freeman-Wimley algorithm, which is a highly accurate prediction method based on the physicochemical properties of experimentally characterized transmembrane β barrel (TMBB) structures, was used to predict TMBBs in the genome of Salmonella typhimurium LT2. The previously uncharacterized product of gene yshA was tested as a model for validating the algorithm. YshA is a highly conserved 230-residue protein that is predicted to have 10 transmembrane β-strands and an N-terminal signal sequence. All of the physicochemical and spectroscopic properties exhibited by YshA are consistent with the prediction that it is a TMBB. Specifically, recombinant YshA localizes to the outer membrane when expressed in Escherichia coli; YshA has β-sheet-rich secondary structure with stable tertiary contacts in the presence of detergent micelles or when reconstituted into a lipid bilayer; when in a lipid bilayer, YshA forms a membrane-spanning pore with an effective radius of ~0.7 nm. Taken together, these data substantiate the predictions made by the Freeman-Wimley algorithm by showing that YshA is a TMBB protein. PMID:20863811

  10. Prediction of transmembrane helices from hydrophobic characteristics of proteins.

    PubMed

    Ponnuswamy, P K; Gromiha, M M

    1993-10-01

    Membrane proteins, requiring to be embedded into the lipid bilayers, have evolved to have amino acid sequences that will fold with a hydrophobic surface in contact with the alkane chains of the lipids and polar surface in contact with the aqueous phases on both sides of the membrane and the polar head groups of the lipids. It is generally assumed that the characteristics of the aqueous parts of the membrane proteins are similar to those of normal globular proteins, and the embedded parts are highly hydrophobic. In our earlier works, we introduced the concept of 'surrounding hydrophobicity' and developed a hydrophobicity scale for the 20 amino acid residues, and applied it successfully to the study of the family of globular proteins. In this work we use the concept of surrounding hydrophobicity to indicate quantitatively how the aqueous parts of membrane proteins compare with the normal globular proteins, and how rich the embedded parts are in their hydrophobic activity. We then develop a surrounding hydrophobicity scale applicable to membrane proteins, by mixing judicially the surrounding hydrophobicities observed in the crystals of the membrane protein, photosynthetic reaction center from the bacterium Rhodopseudomonas viridis, porin from Rhodobacter capsulatus and a set of 64 globular proteins. A predictive scheme based on this scale predicts from amino acid sequence, transmembrane segments in PRC and randomly selected 26 membrane proteins to 80% level of accuracy. This is a much higher predictive power when compared to the existing popular methods. A new procedure to measure the amphipathicity of sequence segments is proposed, and it is used to characterize the transmembrane parts of the sample membrane proteins.

  11. Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression.

    PubMed

    Yu, Dong-Jun; Li, Yang; Hu, Jun; Yang, Xibei; Yang, Jing-Yu; Shen, Hong-Bin

    2015-01-01

    Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.

  12. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

    PubMed

    Venner, Eric; Lisewski, Andreas Martin; Erdin, Serkan; Ward, R Matthew; Amin, Shivas R; Lichtarge, Olivier

    2010-12-13

    High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.

  13. Predicting protein function by frequent functional association pattern mining in protein interaction networks.

    PubMed

    Cho, Young-Rae; Zhang, Aidong

    2010-01-01

    Predicting protein function from protein interaction networks has been challenging because of the complexity of functional relationships among proteins. Most previous function prediction methods depend on the neighborhood of or the connected paths to known proteins. However, their accuracy has been limited due to the functional inconsistency of interacting proteins. In this paper, we propose a novel approach for function prediction by identifying frequent patterns of functional associations in a protein interaction network. A set of functions that a protein performs is assigned into the corresponding node as a label. A functional association pattern is then represented as a labeled subgraph. Our frequent labeled subgraph mining algorithm efficiently searches the functional association patterns that occur frequently in the network. It iteratively increases the size of frequent patterns by one node at a time by selective joining, and simplifies the network by a priori pruning. Using the yeast protein interaction network, our algorithm found more than 1400 frequent functional association patterns. The function prediction is performed by matching the subgraph, including the unknown protein, with the frequent patterns analogous to it. By leave-one-out cross validation, we show that our approach has better performance than previous link-based methods in terms of prediction accuracy. The frequent functional association patterns generated in this study might become the foundations of advanced analysis for functional behaviors of proteins in a system level.

  14. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    SciTech Connect

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.; Gonzalez, Rachel M.; Varnum, Susan M.; Zangar, Richard C.

    2008-07-14

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensity that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting

  15. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    PubMed Central

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species. PMID:26539460

  16. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.« less

  17. Prediction of Protein–Protein Interactions by Evidence Combining Methods

    PubMed Central

    Chang, Ji-Wei; Zhou, Yan-Qing; Ul Qamar, Muhammad Tahir; Chen, Ling-Ling; Ding, Yu-Duan

    2016-01-01

    Most cellular functions involve proteins’ features based on their physical interactions with other partner proteins. Sketching a map of protein–protein interactions (PPIs) is therefore an important inception step towards understanding the basics of cell functions. Several experimental techniques operating in vivo or in vitro have made significant contributions to screening a large number of protein interaction partners, especially high-throughput experimental methods. However, computational approaches for PPI predication supported by rapid accumulation of data generated from experimental techniques, 3D structure definitions, and genome sequencing have boosted the map sketching of PPIs. In this review, we shed light on in silico PPI prediction methods that integrate evidence from multiple sources, including evolutionary relationship, function annotation, sequence/structure features, network topology and text mining. These methods are developed for integration of multi-dimensional evidence, for designing the strategies to predict novel interactions, and for making the results consistent with the increase of prediction coverage and accuracy. PMID:27879651

  18. Toward Relatively General and Accurate Quantum Chemical Predictions of Solid-State (17)O NMR Chemical Shifts in Various Biologically Relevant Oxygen-Containing Compounds.

    PubMed

    Rorick, Amber; Michael, Matthew A; Yang, Liu; Zhang, Yong

    2015-09-03

    Oxygen is an important element in most biologically significant molecules, and experimental solid-state (17)O NMR studies have provided numerous useful structural probes to study these systems. However, computational predictions of solid-state (17)O NMR chemical shift tensor properties are still challenging in many cases, and in particular, each of the prior computational works is basically limited to one type of oxygen-containing system. This work provides the first systematic study of the effects of geometry refinement, method, and basis sets for metal and nonmetal elements in both geometry optimization and NMR property calculations of some biologically relevant oxygen-containing compounds with a good variety of XO bonding groups (X = H, C, N, P, and metal). The experimental range studied is of 1455 ppm, a major part of the reported (17)O NMR chemical shifts in organic and organometallic compounds. A number of computational factors toward relatively general and accurate predictions of (17)O NMR chemical shifts were studied to provide helpful and detailed suggestions for future work. For the studied kinds of oxygen-containing compounds, the best computational approach results in a theory-versus-experiment correlation coefficient (R(2)) value of 0.9880 and a mean absolute deviation of 13 ppm (1.9% of the experimental range) for isotropic NMR shifts and an R(2) value of 0.9926 for all shift-tensor properties. These results shall facilitate future computational studies of (17)O NMR chemical shifts in many biologically relevant systems, and the high accuracy may also help the refinement and determination of active-site structures of some oxygen-containing substrate-bound proteins.

  19. PRISM: a web server and repository for prediction of protein–protein interactions and modeling their 3D complexes

    PubMed Central

    Baspinar, Alper; Cukuroglu, Engin; Nussinov, Ruth; Keskin, Ozlem; Gursoy, Attila

    2014-01-01

    The PRISM web server enables fast and accurate prediction of protein–protein interactions (PPIs). The prediction algorithm is knowledge-based. It combines structural similarity and accounts for evolutionary conservation in the template interfaces. The predicted models are stored in its repository. Given two protein structures, PRISM will provide a structural model of their complex if a matching template interface is available. Users can download the complex structure, retrieve the interface residues and visualize the complex model. The PRISM web server is user friendly, free and open to all users at http://cosbi.ku.edu.tr/prism. PMID:24829450

  20. Population Synthesis in the Blue. IV. Accurate Model Predictions for Lick Indices and UBV Colors in Single Stellar Populations

    NASA Astrophysics Data System (ADS)

    Schiavon, Ricardo P.

    2007-07-01

    We present a new set of model predictions for 16 Lick absorption line indices from Hδ through Fe5335 and UBV colors for single stellar populations with ages ranging between 1 and 15 Gyr, [Fe/H] ranging from -1.3 to +0.3, and variable abundance ratios. The models are based on accurate stellar parameters for the Jones library stars and a new set of fitting functions describing the behavior of line indices as a function of effective temperature, surface gravity, and iron abundance. The abundances of several key elements in the library stars have been obtained from the literature in order to characterize the abundance pattern of the stellar library, thus allowing us to produce model predictions for any set of abundance ratios desired. We develop a method to estimate mean ages and abundances of iron, carbon, nitrogen, magnesium, and calcium that explores the sensitivity of the various indices modeled to those parameters. The models are compared to high-S/N data for Galactic clusters spanning the range of ages, metallicities, and abundance patterns of interest. Essentially all line indices are matched when the known cluster parameters are adopted as input. Comparing the models to high-quality data for galaxies in the nearby universe, we reproduce previous results regarding the enhancement of light elements and the spread in the mean luminosity-weighted ages of early-type galaxies. When the results from the analysis of blue and red indices are contrasted, we find good consistency in the [Fe/H] that is inferred from different Fe indices. Applying our method to estimate mean ages and abundances from stacked SDSS spectra of early-type galaxies brighter than L*, we find mean luminosity-weighed ages of the order of ~8 Gyr and iron abundances slightly below solar. Abundance ratios, [X/Fe], tend to be higher than solar and are positively correlated with galaxy luminosity. Of all elements, nitrogen is the more strongly correlated with galaxy luminosity, which seems to indicate

  1. Plasma proteins predict conversion to dementia from prodromal disease

    PubMed Central

    Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L.; Ashton, Nicholas J.; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon

    2014-01-01

    Background The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Methods Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Results Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). Conclusions We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. PMID:25012867

  2. Plasmodium falciparum parasites lacking histidine-rich protein 2 and 3: a review and recommendations for accurate reporting

    PubMed Central

    2014-01-01

    Malaria rapid diagnostic tests (RDTs) play a critical role in malaria case management, surveillance and case investigations. Test performance is largely determined by design and quality characteristics, such as detection sensitivity, specificity, and thermal stability. However, parasite characteristics such as variable or absent expression of antigens targeted by RDTs can also affect RDT performance. Plasmodium falciparum parasites lacking the PfHRP2 protein, the most common target antigen for detection of P. falciparum, have been reported in some regions. Therefore, accurately mapping the presence and prevalence of P. falciparum parasites lacking pfhrp2 would be an important step so that RDTs targeting alternative antigens, or microscopy, can be preferentially selected for use in such regions. Herein the available evidence and molecular basis for identifying malaria parasites lacking PfHRP2 is reviewed, and a set of recommended procedures to apply for future investigations for parasites lacking PfHRP2, is proposed. PMID:25052298

  3. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach.

    PubMed

    Pires, Douglas E V; Ascher, David B; Blundell, Tom L

    2014-07-01

    Cancer genome and other sequencing initiatives are generating extensive data on non-synonymous single nucleotide polymorphisms (nsSNPs) in human and other genomes. In order to understand the impacts of nsSNPs on the structure and function of the proteome, as well as to guide protein engineering, accurate in silicomethodologies are required to study and predict their effects on protein stability. Despite the diversity of available computational methods in the literature, none has proven accurate and dependable on its own under all scenarios where mutation analysis is required. Here we present DUET, a web server for an integrated computational approach to study missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM). We demonstrate that the proposed method improves overall accuracy of the predictions in comparison with either method individually and performs as well as or better than similar methods. The DUET web server is freely and openly available at http://structure.bioc.cam.ac.uk/duet.

  4. A repeat protein-based DNA polymerase inhibitor for an efficient and accurate gene amplification by PCR.

    PubMed

    Hwang, Da-Eun; Shin, Yong-Keol; Munashingha, Palinda Ruvan; Park, So-Yeon; Seo, Yeon-Soo; Kim, Hak-Sung

    2016-12-01

    A polymerase chain reaction (PCR) using a thermostable DNA polymerase is the most widely applied method in many areas of research, including life sciences, biotechnology, and medical sciences. However, a conventional PCR incurs an amplification of undesired genes mainly owing to non-specifically annealed primers and the formation of a primer-dimer complex. Herein, we present the development of a Taq DNA polymerase-specific repebody, which is a small-sized protein binder composed of leucine rich repeat (LRR) modules, as a thermolabile inhibitor for a precise and accurate gene amplification by PCR. We selected a repebody that specifically binds to the DNA polymerase through a phage display, and increased its affinity to up to 10 nM through a modular evolution approach. The repebody was shown to effectively inhibit DNA polymerase activity at low temperature and undergo thermal denaturation at high temperature, leading to a rapid and full recovery of the polymerase activity, during the initial denaturation step of the PCR. The performance and utility of the repebody was demonstrated through an accurate and efficient amplification of a target gene without nonspecific gene products in both conventional and real-time PCRs. The repebody is expected to be effectively utilized as a thermolabile inhibitor in a PCR. Biotechnol. Bioeng. 2016;113: 2544-2552. © 2016 Wiley Periodicals, Inc.

  5. DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

    PubMed Central

    Iqbal, Sumaiya; Hoque, Md Tamjidul

    2015-01-01

    Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0. PMID:26517719

  6. SCRATCH: a protein structure and structural feature prediction server

    PubMed Central

    Cheng, J.; Randall, A. Z.; Sweredoski, M. J.; Baldi, P.

    2005-01-01

    SCRATCH is a server for predicting protein tertiary structure and structural features. The SCRATCH software suite includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiary structure. The user simply provides an amino acid sequence and selects the desired predictions, then submits to the server. Results are emailed to the user. The server is available at . PMID:15980571

  7. TESTLoc: protein subcellular localization prediction from EST data

    PubMed Central

    2010-01-01

    Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html PMID:21078192

  8. Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm.

    PubMed

    Gómez, Antonio; Cedano, Juan; Espadaler, Jordi; Hermoso, Antonio; Piñol, Jaume; Querol, Enrique

    2008-02-01

    The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.

  9. An OGA-Resistant Probe Allows Specific Visualization and Accurate Identification of O-GlcNAc-Modified Proteins in Cells.

    PubMed

    Li, Jing; Wang, Jiajia; Wen, Liuqing; Zhu, He; Li, Shanshan; Huang, Kenneth; Jiang, Kuan; Li, Xu; Ma, Cheng; Qu, Jingyao; Parameswaran, Aishwarya; Song, Jing; Zhao, Wei; Wang, Peng George

    2016-11-18

    O-linked β-N-acetyl-glucosamine (O-GlcNAc) is an essential and ubiquitous post-translational modification present in nucleic and cytoplasmic proteins of multicellular eukaryotes. The metabolic chemical probes such as GlcNAc or GalNAc analogues bearing ketone or azide handles, in conjunction with bioorthogonal reactions, provide a powerful approach for detecting and identifying this modification. However, these chemical probes either enter multiple glycosylation pathways or have low labeling efficiency. Therefore, selective and potent probes are needed to assess this modification. We report here the development of a novel probe, 1,3,6-tri-O-acetyl-2-azidoacetamido-2,4-dideoxy-d-glucopyranose (Ac34dGlcNAz), that can be processed by the GalNAc salvage pathway and transferred by O-GlcNAc transferase (OGT) to O-GlcNAc proteins. Due to the absence of a hydroxyl group at C4, this probe is less incorporated into α/β 4-GlcNAc or GalNAc containing glycoconjugates. Furthermore, the O-4dGlcNAz modification was resistant to the hydrolysis of O-GlcNAcase (OGA), which greatly enhanced the efficiency of incorporation for O-GlcNAcylation. Combined with a click reaction, Ac34dGlcNAz allowed the selective visualization of O-GlcNAc in cells and accurate identification of O-GlcNAc-modified proteins with LC-MS/MS. This probe represents a more potent and selective tool in tracking, capturing, and identifying O-GlcNAc-modified proteins in cells and cell lysates.

  10. Infectious titres of sheep scrapie and bovine spongiform encephalopathy agents cannot be accurately predicted from quantitative laboratory test results.

    PubMed

    González, Lorenzo; Thorne, Leigh; Jeffrey, Martin; Martin, Stuart; Spiropoulos, John; Beck, Katy E; Lockey, Richard W; Vickery, Christopher M; Holder, Thomas; Terry, Linda

    2012-11-01

    It is widely accepted that abnormal forms of the prion protein (PrP) are the best surrogate marker for the infectious agent of prion diseases and, in practice, the detection of such disease-associated (PrP(d)) and/or protease-resistant (PrP(res)) forms of PrP is the cornerstone of diagnosis and surveillance of the transmissible spongiform encephalopathies (TSEs). Nevertheless, some studies question the consistent association between infectivity and abnormal PrP detection. To address this discrepancy, 11 brain samples of sheep affected with natural scrapie or experimental bovine spongiform encephalopathy were selected on the basis of the magnitude and predominant types of PrP(d) accumulation, as shown by immunohistochemical (IHC) examination; contra-lateral hemi-brain samples were inoculated at three different dilutions into transgenic mice overexpressing ovine PrP and were also subjected to quantitative analysis by three biochemical tests (BCTs). Six samples gave 'low' infectious titres (10⁶·⁵ to 10⁶·⁷ LD₅₀ g⁻¹) and five gave 'high titres' (10⁸·¹ to ≥ 10⁸·⁷ LD₅₀ g⁻¹) and, with the exception of the Western blot analysis, those two groups tended to correspond with samples with lower PrP(d)/PrP(res) results by IHC/BCTs. However, no statistical association could be confirmed due to high individual sample variability. It is concluded that although detection of abnormal forms of PrP by laboratory methods remains useful to confirm TSE infection, infectivity titres cannot be predicted from quantitative test results, at least for the TSE sources and host PRNP genotypes used in this study. Furthermore, the near inverse correlation between infectious titres and Western blot results (high protease pre-treatment) argues for a dissociation between infectivity and PrP(res).

  11. Predicting Protein Function via Semantic Integration of Multiple Networks.

    PubMed

    Yu, Guoxian; Fu, Guangyuan; Wang, Jun; Zhu, Hailong

    2016-01-01

    Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.

  12. Prediction of membrane protein types using maximum variance projection

    NASA Astrophysics Data System (ADS)

    Wang, Tong; Yang, Jie

    2011-05-01

    Predicting membrane protein types has a positive influence on further biological function analysis. To quickly and efficiently annotate the type of an uncharacterized membrane protein is a challenge. In this work, a system based on maximum variance projection (MVP) is proposed to improve the prediction performance of membrane protein types. The feature extraction step is based on a hybridization representation approach by fusing Position-Specific Score Matrix composition. The protein sequences are quantized in a high-dimensional space using this representation strategy. Some problems will be brought when analysing these high-dimensional feature vectors such as high computing time and high classifier complexity. To solve this issue, MVP, a novel dimensionality reduction algorithm is introduced by extracting the essential features from the high-dimensional feature space. Then, a K-nearest neighbour classifier is employed to identify the types of membrane proteins based on their reduced low-dimensional features. As a result, the jackknife and independent dataset test success rates of this model reach 86.1 and 88.4%, respectively, and suggest that the proposed approach is very promising for predicting membrane proteins types.

  13. Do Skilled Elementary Teachers Hold Scientific Conceptions and Can They Accurately Predict the Type and Source of Students' Preconceptions of Electric Circuits?

    ERIC Educational Resources Information Center

    Lin, Jing-Wen

    2016-01-01

    Holding scientific conceptions and having the ability to accurately predict students' preconceptions are a prerequisite for science teachers to design appropriate constructivist-oriented learning experiences. This study explored the types and sources of students' preconceptions of electric circuits. First, 438 grade 3 (9 years old) students were…

  14. Computational prediction of methylation types of covalently modified lysine and arginine residues in proteins.

    PubMed

    Deng, Wankun; Wang, Yongbo; Ma, Lili; Zhang, Ying; Ullah, Shahid; Xue, Yu

    2016-05-30

    Protein methylation is an essential posttranslational modification (PTM) mostly occurs at lysine and arginine residues, and regulates a variety of cellular processes. Owing to the rapid progresses in the large-scale identification of methylation sites, the available data set was dramatically expanded, and more attention has been paid on the identification of specific methylation types of modification residues. Here, we briefly summarized the current progresses in computational prediction of methylation sites, which provided an accurate, rapid and efficient approach in contrast with labor-intensive experiments. We collected 5421 methyllysines and methylarginines in 2592 proteins from the literature, and classified most of the sites into different types. Data analyses demonstrated that different types of methylated proteins were preferentially involved in different biological processes and pathways, whereas a unique sequence preference was observed for each type of methylation sites. Thus, we developed a predictor of GPS-MSP, which can predict mono-, di- and tri-methylation types for specific lysines, and mono-, symmetric di- and asymmetrical di-methylation types for specific arginines. We critically evaluated the performance of GPS-MSP, and compared it with other existing tools. The satisfying results exhibited that the classification of methylation sites into different types for training can considerably improve the prediction accuracy. Taken together, we anticipate that our study provides a new lead for future computational analysis of protein methylation, and the prediction of methylation types of covalently modified lysine and arginine residues can generate more useful information for further experimental manipulation.

  15. Addressing the Role of Conformational Diversity in Protein Structure Prediction

    PubMed Central

    Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  16. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    PubMed Central

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. PMID:26543860

  17. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  18. Aptamer-conjugated live human immune cell based biosensors for the accurate detection of C-reactive protein

    PubMed Central

    Hwang, Jangsun; Seo, Youngmin; Jo, Yeonho; Son, Jaewoo; Choi, Jonghoon

    2016-01-01

    C-reactive protein (CRP) is a pentameric protein that is present in the bloodstream during inflammatory events, e.g., liver failure, leukemia, and/or bacterial infection. The level of CRP indicates the progress and prognosis of certain diseases; it is therefore necessary to measure CRP levels in the blood accurately. The normal concentration of CRP is reported to be 1–3 mg/L. Inflammatory events increase the level of CRP by up to 500 times; accordingly, CRP is a biomarker of acute inflammatory disease. In this study, we demonstrated the preparation of DNA aptamer-conjugated peripheral blood mononuclear cells (Apt-PBMCs) that specifically capture human CRP. Live PBMCs functionalized with aptamers could detect different levels of human CRP by producing immune complexes with reporter antibody. The binding behavior of Apt-PBMCs toward highly concentrated CRP sites was also investigated. The immune responses of Apt-PBMCs were evaluated by measuring TNF-alpha secretion after stimulating the PBMCs with lipopolysaccharides. In summary, engineered Apt-PBMCs have potential applications as live cell based biosensors and for in vitro tracing of CRP secretion sites. PMID:27708384

  19. Aptamer-conjugated live human immune cell based biosensors for the accurate detection of C-reactive protein

    NASA Astrophysics Data System (ADS)

    Hwang, Jangsun; Seo, Youngmin; Jo, Yeonho; Son, Jaewoo; Choi, Jonghoon

    2016-10-01

    C-reactive protein (CRP) is a pentameric protein that is present in the bloodstream during inflammatory events, e.g., liver failure, leukemia, and/or bacterial infection. The level of CRP indicates the progress and prognosis of certain diseases; it is therefore necessary to measure CRP levels in the blood accurately. The normal concentration of CRP is reported to be 1–3 mg/L. Inflammatory events increase the level of CRP by up to 500 times; accordingly, CRP is a biomarker of acute inflammatory disease. In this study, we demonstrated the preparation of DNA aptamer-conjugated peripheral blood mononuclear cells (Apt-PBMCs) that specifically capture human CRP. Live PBMCs functionalized with aptamers could detect different levels of human CRP by producing immune complexes with reporter antibody. The binding behavior of Apt-PBMCs toward highly concentrated CRP sites was also investigated. The immune responses of Apt-PBMCs were evaluated by measuring TNF-alpha secretion after stimulating the PBMCs with lipopolysaccharides. In summary, engineered Apt-PBMCs have potential applications as live cell based biosensors and for in vitro tracing of CRP secretion sites.

  20. Simple and accurate determination of global tau(R) in proteins using (13)C or (15)N relaxation data.

    PubMed

    Mispelter, J; Izadi-Pruneyre, N; Quiniou, E; Adjadj, E

    2000-03-01

    In the study of protein dynamics by (13)C or (15)N relaxation measurements different models from the Lipari-Szabo formalism are used in order to determine the motion parameters. The global rotational correlation time tau(R) of the molecule must be estimated prior to the analysis. In this Communication, the authors propose a new approach in determining an accurate value for tau(R) in order to realize the best fit of R(2) for the whole sequence of the protein, regardless of the different type of motions atoms may experience. The method first determines the highly structured regions of the sequence. For each corresponding site, the Lipari-Szabo parameters are calculated for R(1) and NOE, using an arbitrary value for tau(R). The chi(2) for R(2), summed over the selected sites, shows a clear minimum, as a function of tau(R). This minimum is used to better estimate a proper value for tau(R).

  1. Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis.

    PubMed

    Zhu, Yafeng; Engström, Pär G; Tellgren-Roth, Christian; Baudo, Charles D; Kennell, John C; Sun, Sheng; Billmyre, R Blake; Schröder, Markus S; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L; Heitman, Joseph; Scheynius, Annika; Lehtiö, Janne

    2017-01-18

    Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.

  2. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    PubMed

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.

  3. OSPREY Predicts Resistance Mutations Using Positive and Negative Computational Protein Design.

    PubMed

    Ojewole, Adegoke; Lowegard, Anna; Gainza, Pablo; Reeve, Stephanie M; Georgiev, Ivelin; Anderson, Amy C; Donald, Bruce R

    2017-01-01

    Drug resistance in protein targets is an increasingly common phenomenon that reduces the efficacy of both existing and new antibiotics. However, knowledge of future resistance mutations during pre-clinical phases of drug development would enable the design of novel antibiotics that are robust against not only known resistant mutants, but also against those that have not yet been clinically observed. Computational structure-based protein design (CSPD) is a transformative field that enables the prediction of protein sequences with desired biochemical properties such as binding affinity and specificity to a target. The use of CSPD to predict previously unseen resistance mutations represents one of the frontiers of computational protein design. In a recent study (Reeve et al. Proc Natl Acad Sci U S A 112(3):749-754, 2015), we used our OSPREY (Open Source Protein REdesign for You) suite of CSPD algorithms to prospectively predict resistance mutations that arise in the active site of the dihydrofolate reductase enzyme from methicillin-resistant Staphylococcus aureus (SaDHFR) in response to selective pressure from an experimental competitive inhibitor. We demonstrated that our top predicted candidates are indeed viable resistant mutants. Since that study, we have significantly enhanced the capabilities of OSPREY with not only improved modeling of backbone flexibility, but also efficient multi-state design, fast sparse approximations, partitioned continuous rotamers for more accurate energy bounds, and a computationally efficient representation of molecular-mechanics and quantum-mechanical energy functions. Here, using SaDHFR as an example, we present a protocol for resistance prediction using the latest version of OSPREY. Specifically, we show how to use a combination of positive and negative design to predict active site escape mutations that maintain the enzyme's catalytic function but selectively ablate binding of an inhibitor.

  4. OSPREY Predicts Resistance Mutations using Positive and Negative Computational Protein Design

    PubMed Central

    Ojewole, Adegoke; Lowegard, Anna; Gainza, Pablo; Reeve, Stephanie M.; Georgiev, Ivelin; Anderson, Amy C.; Donald, Bruce R.

    2016-01-01

    Summary Drug resistance in protein targets is an increasingly common phenomenon that reduces the efficacy of both existing and new antibiotics. However, knowledge of future resistance mutations during pre-clinical phases of drug development would enable the design of novel antibiotics that are robust against not only known resistant mutants, but also against those that have not yet been clinically observed. Computational structure-based protein design (CSPD) is a transformative field that enables the prediction of protein sequences with desired biochemical properties such as binding affinity and specificity to a target. The use of CSPD to predict previously unseen resistance mutations represents one of the frontiers of computational protein design. In a recent study (1), we used our OSPREY (Open Source Protein REdesign for You) suite of CSPD algorithms to prospectively predict resistance mutations that arise in the active site of the dihydrofolate reductase enzyme from methicillin-resistant Staphylococcus aureus (SaDHFR) in response to selective pressure from an experimental competitive inhibitor. We demonstrated that our top predicted candidates are indeed viable resistant mutants. Since that study, we have significantly enhanced the capabilities of OSPREY with not only improved modeling of backbone flexibility, but also efficient multi-state design, fast sparse approximations, partitioned rotamers for more accurate energy bounds, and a computationally efficient representation of molecular-mechanics and quantum-mechanical energy functions. Here, using SaDHFR as an example, we present a protocol for resistance prediction using the latest version of OSPREY. Specifically, we show how to use a combination of positive and negative design to predict active site escape mutations that maintain the enzyme’s catalytic function but selectively ablate binding of an inhibitor. PMID:27914058

  5. Urinary intestinal fatty acid binding protein predicts necrotizing enterocolitis.

    PubMed

    Gregory, Katherine E; Winston, Abigail B; Yamamoto, Hidemi S; Dawood, Hassan Y; Fashemi, Titilayo; Fichorova, Raina N; Van Marter, Linda J

    2014-06-01

    Necrotizing enterocolitis, characterized by sudden onset and rapid progression, remains the most significant gastrointestinal disorder among premature infants. In seeking a predictive biomarker, we found intestinal fatty acid binding protein, an indicator of enterocyte damage, was substantially increased within three and seven days before the diagnosis of necrotizing enterocolitis.

  6. PROSNET: INTEGRATING HOMOLOGY WITH MOLECULAR NETWORKS FOR PROTEIN FUNCTION PREDICTION

    PubMed Central

    Wang, Sheng; Qu, Meng

    2016-01-01

    Automated annotation of protein function has become a critical task in the post-genomic era. Network-based approaches and homology-based approaches have been widely used and recently tested in large-scale community-wide assessment experiments. It is natural to integrate network data with homology information to further improve the predictive performance. However, integrating these two heterogeneous, high-dimensional and noisy datasets is non-trivial. In this work, we introduce a novel protein function prediction algorithm ProSNet. An integrated heterogeneous network is first built to include molecular networks of multiple species and link together homologous proteins across multiple species. Based on this integrated network, a dimensionality reduction algorithm is introduced to obtain compact low-dimensional vectors to encode proteins in the network. Finally, we develop machine learning classification algorithms that take the vectors as input and make predictions by transferring annotations both within each species and across different species. Extensive experiments on five major species demonstrate that our integration of homology with molecular networks substantially improves the predictive performance over existing approaches. PMID:27896959

  7. SitesIdentify: a protein functional site prediction tool

    PubMed Central

    2009-01-01

    Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/ PMID:19922660

  8. Prediction of N-terminal protein sorting signals.

    PubMed

    Claros, M G; Brunak, S; von Heijne, G

    1997-06-01

    Recently, neural networks have been applied to a widening range of problems in molecular biology. An area particularly suited to neural-network methods is the identification of protein sorting signals and the prediction of their cleavage sites, as these functional units are encoded by local, linear sequences of amino acids rather than global 3D structures.

  9. Protease-inhibitor interaction predictions: Lessons on the complexity of protein-protein interactions.

    PubMed

    Fortelny, Nikolaus; Butler, Georgina S; Overall, Christopher Mark; Pavlidis, Paul

    2017-04-06

    Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features. Thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions

  10. Predicting protein-protein interactions from multimodal biological data sources via nonnegative matrix tri-factorization.

    PubMed

    Wang, Hua; Huang, Heng; Ding, Chris; Nie, Feiping

    2013-04-01

    Protein interactions are central to all the biological processes and structural scaffolds in living organisms, because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Several high-throughput methods, for example, yeast two-hybrid system and mass spectrometry method, can help determine protein interactions, which, however, suffer from high false-positive rates. Moreover, many protein interactions predicted by one method are not supported by another. Therefore, computational methods are necessary and crucial to complete the interactome expeditiously. In this work, we formulate the problem of predicting protein interactions from a new mathematical perspective--sparse matrix completion, and propose a novel nonnegative matrix factorization (NMF)-based matrix completion approach to predict new protein interactions from existing protein interaction networks. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on four species, Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, and Caenorhabditis elegans, have shown that our new methods outperform related state-of-the-art protein interaction prediction methods.

  11. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  12. Boosting compound-protein interaction prediction by deep learning.

    PubMed

    Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng

    2016-11-01

    The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets.

  13. Three-dimensional protein structure prediction: Methods and computational strategies.

    PubMed

    Dorn, Márcio; E Silva, Mariel Barbachan; Buriol, Luciana S; Lamb, Luis C

    2014-10-12

    A long standing problem in structural bioinformatics is to determine the three-dimensional (3-D) structure of a protein when only a sequence of amino acid residues is given. Many computational methodologies and algorithms have been proposed as a solution to the 3-D Protein Structure Prediction (3-D-PSP) problem. These methods can be divided in four main classes: (a) first principle methods without database information; (b) first principle methods with database information; (c) fold recognition and threading methods; and (d) comparative modeling methods and sequence alignment strategies. Deterministic computational techniques, optimization techniques, data mining and machine learning approaches are typically used in the construction of computational solutions for the PSP problem. Our main goal with this work is to review the methods and computational strategies that are currently used in 3-D protein prediction.

  14. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  15. Using support vector machine for improving protein-protein interaction prediction utilizing domain interactions

    SciTech Connect

    Singhal, Mudita; Shah, Anuj R.; Brown, Roslyn N.; Adkins, Joshua N.

    2010-10-02

    Understanding protein interactions is essential to gain insights into the biological processes at the whole cell level. The high-throughput experimental techniques for determining protein-protein interactions (PPI) are error prone and expensive with low overlap amongst them. Although several computational methods have been proposed for predicting protein interactions there is definite room for improvement. Here we present DomainSVM, a predictive method for PPI that uses computationally inferred domain-domain interaction values in a Support Vector Machine framework to predict protein interactions. DomainSVM method utilizes evidence of multiple interacting domains to predict a protein interaction. It outperforms existing methods of PPI prediction by achieving very high explanation ratios, precision, specificity, sensitivity and F-measure values in a 10 fold cross-validation study conducted on the positive and negative PPIs in yeast. A Functional comparison study using GO annotations on the positive and the negative test sets is presented in addition to discussing novel PPI predictions in Salmonella Typhimurium.

  16. Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates.

    PubMed

    Gront, Dominik; Kmiecik, Sebastian; Kolinski, Andrzej

    2007-07-15

    In this contribution, we present an algorithm for protein backbone reconstruction that comprises very high computational efficiency with high accuracy. Reconstruction of the main chain atomic coordinates from the alpha carbon trace is a common task in protein modeling, including de novo structure prediction, comparative modeling, and processing experimental data. The method employed in this work follows the main idea of some earlier approaches to the problem. The details and careful design of the present approach are new and lead to the algorithm that outperforms all commonly used earlier applications. BBQ (Backbone Building from Quadrilaterals) program has been extensively tested both on native structures as well as on near-native decoy models and compared with the different available existing methods. Obtained results provide a comprehensive benchmark of existing tools and evaluate their applicability to a large scale modeling using a reduced representation of protein conformational space. The BBQ package is available for downloading from our website at http://biocomp.chem.uw.edu.pl/services/BBQ/. This webpage also provides a user manual that describes BBQ functions in detail.

  17. Nanoparticles-cell association predicted by protein corona fingerprints

    NASA Astrophysics Data System (ADS)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  18. A new mixed-mode model for interpreting and predicting protein elution during isoelectric chromatofocusing.

    PubMed

    Choy, Derek Y C; Creagh, A Louise; von Lieres, Eric; Haynes, Charles

    2014-05-01

    Experimental data are combined with classic theories describing electrolytes in solution and at surfaces to define the primary mechanisms influencing protein retention and elution during isoelectric chromatofocusing (ICF) of proteins and protein mixtures. Those fundamental findings are used to derive a new model to understand and predict elution times of proteins during ICF. The model uses a modified form of the steric mass action (SMA) isotherm to account for both ion exchange and isoelectric focusing contributions to protein partitioning. The dependence of partitioning on pH is accounted for through the characteristic charge parameter m of the SMA isotherm and the application of Gouy-Chapman theory to define the dependence of the equilibrium binding constant Kbi on both m and ionic strength. Finally, the effects of changes in matrix surface pH on protein retention are quantified through a Donnan equilibrium type model. By accounting for isoelectric focusing, ion binding and exchange, and surface pH contributions to protein retention and elution, the model is shown to accurately capture the dependence of protein elution times on column operating conditions.

  19. Sequence-Based Prediction of Type III Secreted Proteins

    PubMed Central

    Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas

    2009-01-01

    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will

  20. Protein design by fusion: implications for protein structure prediction and evolution

    SciTech Connect

    Skorupka, Katarzyna; Han, Seong Kyu; Nam, Hyun-Jun; Kim, Sanguk; Faham, Salem

    2013-11-19

    Domain fusion is a useful tool in protein design. Here, the structure of a fusion of the heterodimeric flagella-assembly proteins FliS and FliC is reported. Although the ability of the fusion protein to maintain the structure of the heterodimer may be apparent, threading-based structural predictions do not properly fuse the heterodimer. Additional examples of naturally occurring heterodimers that are homologous to full-length proteins were identified. These examples highlight that the designed protein was engineered by the same tools as used in the natural evolution of proteins and that heterodimeric structures contain a wealth of information, currently unused, that can improve structural predictions.

  1. Modelling proteins' hidden conformations to predict antibiotic resistance

    NASA Astrophysics Data System (ADS)

    Hart, Kathryn M.; Ho, Chris M. W.; Dutta, Supratik; Gross, Michael L.; Bowman, Gregory R.

    2016-10-01

    TEM β-lactamase confers bacteria with resistance to many antibiotics and rapidly evolves activity against new drugs. However, functional changes are not easily explained by differences in crystal structures. We employ Markov state models to identify hidden conformations and explore their role in determining TEM's specificity. We integrate these models with existing drug-design tools to create a new technique, called Boltzmann docking, which better predicts TEM specificity by accounting for conformational heterogeneity. Using our MSMs, we identify hidden states whose populations correlate with activity against cefotaxime. To experimentally detect our predicted hidden states, we use rapid mass spectrometric footprinting and confirm our models' prediction that increased cefotaxime activity correlates with reduced Ω-loop flexibility. Finally, we design novel variants to stabilize the hidden cefotaximase states, and find their populations predict activity against cefotaxime in vitro and in vivo. Therefore, we expect this framework to have numerous applications in drug and protein design.

  2. Prediction and Annotation of Plant Protein Interaction Networks

    SciTech Connect

    McDermott, Jason E.; Wang, Jun; Yu, Jun; Wong, Gane Ka-Shu; Samudrala, Ram

    2009-02-01

    Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  3. Prediction of HIV drug resistance from genotype with encoded three-dimensional protein structure

    PubMed Central

    2014-01-01

    Background Drug resistance has become a severe challenge for treatment of HIV infections. Mutations accumulate in the HIV genome and make certain drugs ineffective. Prediction of resistance from genotype data is a valuable guide in choice of drugs for effective therapy. Results In order to improve the computational prediction of resistance from genotype data we have developed a unified encoding of the protein sequence and three-dimensional protein structure of the drug target for classification and regression analysis. The method was tested on genotype-resistance data for mutants of HIV protease and reverse transcriptase. Our graph based sequence-structure approach gives high accuracy with a new sparse dictionary classification method, as well as support vector machine and artificial neural networks classifiers. Cross-validated regression analysis with the sparse dictionary gave excellent correlation between predicted and observed resistance. Conclusion The approach of encoding the protein structure and sequence as a 210-dimensional vector, based on Delaunay triangulation, has promise as an accurate method for predicting resistance from sequence for drugs inhibiting HIV protease and reverse transcriptase. PMID:25081370

  4. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

    PubMed Central

    Pellegrini, Marco

    2015-01-01

    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR. PMID:26442257

  5. Prediction of buried helices in multispan alpha helical membrane proteins.

    PubMed

    Adamian, Larisa; Liang, Jie

    2006-04-01

    Analysis of a database of structures of membrane proteins shows that membrane proteins composed of 10 or more transmembrane (TM) helices often contain buried helices that are inaccessible to phospholipids. We introduce a method for identifying TM helices that are least phospholipid accessible and for prediction of fully buried TM helices in membrane proteins from sequence information alone. Our method is based on the calculation of residue lipophilicity and evolutionary conservation. Given that the number of buried helices in a membrane protein is known, our method achieves an accuracy of 78% and a Matthew's correlation coefficient of 0.68. A server for this tool (RANTS) is available online at http://gila.bioengr.uic.edu/lab/.

  6. Structure-based prediction of host-pathogen protein interactions.

    PubMed

    Mariano, Rachelle; Wuchty, Stefan

    2017-03-16

    The discovery, validation, and characterization of protein-based interactions from different species are crucial for translational research regarding a variety of pathogens, ranging from the malaria parasite Plasmodium falciparum to HIV-1. Here, we review recent advances in the prediction of host-pathogen protein interfaces using structural information. In particular, we observe that current methods chiefly perform machine learning on sequence and domain information to produce large sets of candidate interactions that are further assessed and pruned to generate final, highly probable sets. Structure-based studies have also emphasized the electrostatic properties and evolutionary transformations of pathogenic interfaces, supplying crucial insight into antigenic determinants and the ways pathogens compete for host protein binding. Advancements in spectroscopic and crystallog