Sample records for structure prediction methods

  1. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    PubMed

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  2. Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

    PubMed

    Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

    2011-08-01

    The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.

  3. Prediction of protein secondary structure content for the twilight zone sequences.

    PubMed

    Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke

    2007-11-15

    Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.

  4. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

    PubMed

    Zheng, Ce; Kurgan, Lukasz

    2008-10-10

    beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.

  5. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  6. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  7. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  8. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    PubMed Central

    Zheng, Ce; Kurgan, Lukasz

    2008-01-01

    Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492

  9. Automated prediction of protein function and detection of functional sites from structure.

    PubMed

    Pazos, Florencio; Sternberg, Michael J E

    2004-10-12

    Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.

  10. Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

    PubMed

    Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

    2013-05-01

    Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

  11. TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures.

    PubMed

    Wiebe, Nicholas J P; Meyer, Irmtraud M

    2010-06-24

    The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.

  12. General overview on structure prediction of twilight-zone proteins.

    PubMed

    Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew

    2015-09-04

    Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.

  13. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

    PubMed Central

    2014-01-01

    Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php. PMID:24884954

  14. RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction

    PubMed Central

    Cruz, José Almeida; Blanchet, Marc-Frédérick; Boniecki, Michal; Bujnicki, Janusz M.; Chen, Shi-Jie; Cao, Song; Das, Rhiju; Ding, Feng; Dokholyan, Nikolay V.; Flores, Samuel Coulbourn; Huang, Lili; Lavender, Christopher A.; Lisi, Véronique; Major, François; Mikolajczak, Katarzyna; Patel, Dinshaw J.; Philips, Anna; Puton, Tomasz; Santalucia, John; Sijenyi, Fredrick; Hermann, Thomas; Rother, Kristian; Rother, Magdalena; Serganov, Alexander; Skorupski, Marcin; Soltysinski, Tomasz; Sripakdeevong, Parin; Tuszynska, Irina; Weeks, Kevin M.; Waldsich, Christina; Wildauer, Michael; Leontis, Neocles B.; Westhof, Eric

    2012-01-01

    We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises. PMID:22361291

  15. Predictive and Experimental Approaches for Elucidating Protein–Protein Interactions and Quaternary Structures

    PubMed Central

    Nealon, John Oliver; Philomina, Limcy Seby

    2017-01-01

    The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development. PMID:29206185

  16. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

    PubMed Central

    2007-01-01

    Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273

  17. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers.

    PubMed

    Cao, Han; Ng, Marcus C K; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W I

    2017-09-01

    [Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM . Website is implemented in PHP, MySQL and Apache, with all major browsers supported.

  18. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

    NASA Astrophysics Data System (ADS)

    Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

    2017-09-01

    α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM. Website is implemented in PHP, MySQL and Apache, with all major browsers supported.

  19. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  20. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  1. RNA Secondary Structure Prediction by Using Discrete Mathematics: An Interdisciplinary Research Experience for Undergraduate Students

    ERIC Educational Resources Information Center

    Ellington, Roni; Wachira, James; Nkwanta, Asamoah

    2010-01-01

    The focus of this Research Experience for Undergraduates (REU) project was on RNA secondary structure prediction by using a lattice walk approach. The lattice walk approach is a combinatorial and computational biology method used to enumerate possible secondary structures and predict RNA secondary structure from RNA sequences. The method uses…

  2. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

    PubMed Central

    Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.

    2013-01-01

    We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231

  3. Predicting the helix packing of globular proteins by self-correcting distance geometry.

    PubMed

    Mumenthaler, C; Braun, W

    1995-05-01

    A new self-correcting distance geometry method for predicting the three-dimensional structure of small globular proteins was assessed with a test set of 8 helical proteins. With the knowledge of the amino acid sequence and the helical segments, our completely automated method calculated the correct backbone topology of six proteins. The accuracy of the predicted structures ranged from 2.3 A to 3.1 A for the helical segments compared to the experimentally determined structures. For two proteins, the predicted constraints were not restrictive enough to yield a conclusive prediction. The method can be applied to all small globular proteins, provided the secondary structure is known from NMR analysis or can be predicted with high reliability.

  4. Construction of crystal structure prototype database: methods and applications.

    PubMed

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-26

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  5. Construction of crystal structure prototype database: methods and applications

    NASA Astrophysics Data System (ADS)

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-01

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  6. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  7. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  8. Computational modeling of membrane proteins

    PubMed Central

    Leman, Julia Koehler; Ulmschneider, Martin B.; Gray, Jeffrey J.

    2014-01-01

    The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefitted from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade. PMID:25355688

  9. Improve the prediction of RNA-binding residues using structural neighbours.

    PubMed

    Li, Quan; Cao, Zanxia; Liu, Haiyan

    2010-03-01

    The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.

  10. Antibody-protein interactions: benchmark datasets and prediction tools evaluation

    PubMed Central

    Ponomarenko, Julia V; Bourne, Philip E

    2007-01-01

    Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770

  11. Large-scale model quality assessment for improving protein tertiary structure prediction.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2015-06-15

    Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. © The Author 2015. Published by Oxford University Press.

  12. Analysis of Free Modeling Predictions by RBO Aleph in CASP11

    PubMed Central

    Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver

    2015-01-01

    The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue–residue contact prediction by EPC-map and contact–guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. PMID:26492194

  13. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  14. A sampling-based method for ranking protein structural models by integrating multiple scores and features.

    PubMed

    Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong

    2011-09-01

    One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

  15. Assessment of Protein Side-Chain Conformation Prediction Methods in Different Residue Environments

    PubMed Central

    Peterson, Lenna X.; Kang, Xuejiao; Kihara, Daisuke

    2016-01-01

    Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. PMID:24619909

  16. A novel method for structure-based prediction of ion channel conductance properties.

    PubMed Central

    Smart, O S; Breed, J; Smith, G R; Sansom, M S

    1997-01-01

    A rapid and easy-to-use method of predicting the conductance of an ion channel from its three-dimensional structure is presented. The method combines the pore dimensions of the channel as measured in the HOLE program with an Ohmic model of conductance. An empirically based correction factor is then applied. The method yielded good results for six experimental channel structures (none of which were included in the training set) with predictions accurate to within an average factor of 1.62 to the true values. The predictive r2 was equal to 0.90, which is indicative of a good predictive ability. The procedure is used to validate model structures of alamethicin and phospholamban. Two genuine predictions for the conductance of channels with known structure but without reported conductances are given. A modification of the procedure that calculates the expected results for the effect of the addition of nonelectrolyte polymers on conductance is set out. Results for a cholera toxin B-subunit crystal structure agree well with the measured values. The difficulty in interpreting such studies is discussed, with the conclusion that measurements on channels of known structure are required. Images FIGURE 1 FIGURE 3 FIGURE 4 FIGURE 6 FIGURE 10 PMID:9138559

  17. Blind prediction of noncanonical RNA structure at atomic accuracy.

    PubMed

    Watkins, Andrew M; Geniesse, Caleb; Kladwang, Wipapat; Zakrevsky, Paul; Jaeger, Luc; Das, Rhiju

    2018-05-01

    Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.

  18. Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity

    PubMed Central

    Lee, Hui Sun; Im, Wonpil

    2013-01-01

    Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286

  19. An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

    DOE PAGES

    Zhu, Zizhong; Wu, Ping; Wu, Shunqing; ...

    2017-05-15

    An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less

  20. An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Zizhong; Wu, Ping; Wu, Shunqing

    An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less

  1. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    King, R.D.; Srinivasan, A.

    1996-10-01

    The machine learning program Progol was applied to the problem of forming the structure-activity relationship (SAR) for a set of compounds tested for carcinogenicity in rodent bioassays by the U.S. National Toxicology Program (NTP). Progol is the first inductive logic programming (ILP) algorithm to use a fully relational method for describing chemical structure in SARs, based on using atoms and their bond connectivities. Progol is well suited to forming SARs for carcinogenicity as it is designed to produce easily understandable rules (structural alerts) for sets of noncongeneric compounds. The Progol SAR method was tested by prediction of a set ofmore » compounds that have been widely predicted by other SAR methods (the compounds used in the NTP`s first round of carcinogenesis predictions). For these compounds no method (human or machine) was significantly more accurate than Progol. Progol was the most accurate method that did not use data from biological tests on rodents (however, the difference in accuracy is not significant). The Progol predictions were based solely on chemical structure and the results of tests for Salmonella mutagenicity. Using the full NTP database, the prediction accuracy of Progol was estimated to be 63% ({+-}3%) using 5-fold cross validation. A set of structural alerts for carcinogenesis was automatically generated and the chemical rationale for them investigated-these structural alerts are statistically independent of the Salmonella mutagenicity. Carcinogenicity is predicted for the compounds used in the NTP`s second round of carcinogenesis predictions. The results for prediction of carcinogenesis, taken together with the previous successful applications of predicting mutagenicity in nitroaromatic compounds, and inhibition of angiogenesis by suramin analogues, show that Progol has a role to play in understanding the SARs of cancer-related compounds. 29 refs., 2 figs., 4 tabs.« less

  2. New insights from cluster analysis methods for RNA secondary structure prediction

    PubMed Central

    Rogers, Emily; Heitsch, Christine

    2016-01-01

    A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529

  3. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

    PubMed

    Park, Hahnbeom; Lee, Gyu Rie; Heo, Lim; Seok, Chaok

    2014-01-01

    Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.

  4. RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning.

    PubMed

    Gao, Yujuan; Wang, Sheng; Deng, Minghua; Xu, Jinbo

    2018-05-08

    Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study.

  5. GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors.

    PubMed

    Won, Jonghun; Lee, Gyu Rie; Park, Hahnbeom; Seok, Chaok

    2018-06-07

    The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed. Inspired by the observation of similar ECL2 structures of closely related GPCRs, a template-based sampling method employing loop structure templates selected from the structure database was developed. A new metric for evaluating similarity of the target loop to templates was introduced for template selection. An ab initio loop sampling method was also developed to treat cases without highly similar templates. The ab initio method is based on the previously developed fragment assembly and loop closure method. A new sampling component that takes advantage of secondary structure prediction was added. In addition, a conserved disulfide bridge restraining ECL2 conformation was predicted and analytically incorporated into sampling, reducing the effective dimension of the conformational search space. The sampling method was combined with an existing energy function for comparison with previously reported loop structure prediction methods, and the benchmark test demonstrated outstanding performance.

  6. Toward Fully in Silico Melting Point Prediction Using Molecular Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Y; Maginn, EJ

    2013-03-01

    Melting point is one of the most fundamental and practically important properties of a compound. Molecular computation of melting points. However, all of these methods simulation methods have been developed for the accurate need an experimental crystal structure as input, which means that such calculations are not really predictive since the melting point can be measured easily in experiments once a crystal structure is known. On the other hand, crystal structure prediction (CSP) has become an active field and significant progress has been made, although challenges still exist. One of the main challenges is the existence of many crystal structuresmore » (polymorphs) that are very close in energy. Thermal effects and kinetic factors make the situation even more complicated, such that it is still not trivial to predict experimental crystal structures. In this work, we exploit the fact that free energy differences are often small between crystal structures. We show that accurate melting point predictions can be made by using a reasonable crystal structure from CSP as a starting point for a free energy-based melting point calculation. The key is that most crystal structures predicted by CSP have free energies that are close to that of the experimental structure. The proposed method was tested on two rigid molecules and the results suggest that a fully in silico melting point prediction method is possible.« less

  7. Prediction of beta-turns and beta-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN).

    PubMed

    Kirschner, Andreas; Frishman, Dmitrij

    2008-10-01

    Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. http://webclu.bio.wzw.tum.de/predator-web/.

  8. Improving link prediction in complex networks by adaptively exploiting multiple structural features of networks

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng

    2017-10-01

    So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.

  9. Functional classification of protein structures by local structure matching in graph representation.

    PubMed

    Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

    2018-03-31

    As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  10. From molecule to solid: The prediction of organic crystal structures

    NASA Astrophysics Data System (ADS)

    Dzyabchenko, A. V.

    2008-10-01

    A method for predicting the structure of a molecular crystal based on the systematic search for a global potential energy minimum is considered. The method takes into account unequal occurrences of the structural classes of organic crystals and symmetry of the multidimensional configuration space. The programs of global minimization PMC, comparison of crystal structures CRYCOM, and approximation to the distributions of the electrostatic potentials of molecules FitMEP are presented as tools for numerically solving the problem. Examples of predicted structures substantiated experimentally and the experience of author’s participation in international tests of crystal structure prediction organized by the Cambridge Crystallographic Data Center (Cambridge, UK) are considered.

  11. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  12. First Principles Predictions of the Structure and Function of G-Protein-Coupled Receptors: Validation for Bovine Rhodopsin

    PubMed Central

    Trabanino, Rene J.; Hall, Spencer E.; Vaidehi, Nagarajan; Floriano, Wely B.; Kam, Victor W. T.; Goddard, William A.

    2004-01-01

    G-protein-coupled receptors (GPCRs) are involved in cell communication processes and with mediating such senses as vision, smell, taste, and pain. They constitute a prominent superfamily of drug targets, but an atomic-level structure is available for only one GPCR, bovine rhodopsin, making it difficult to use structure-based methods to design receptor-specific drugs. We have developed the MembStruk first principles computational method for predicting the three-dimensional structure of GPCRs. In this article we validate the MembStruk procedure by comparing its predictions with the high-resolution crystal structure of bovine rhodopsin. The crystal structure of bovine rhodopsin has the second extracellular (EC-II) loop closed over the transmembrane regions by making a disulfide linkage between Cys-110 and Cys-187, but we speculate that opening this loop may play a role in the activation process of the receptor through the cysteine linkage with helix 3. Consequently we predicted two structures for bovine rhodopsin from the primary sequence (with no input from the crystal structure)—one with the EC-II loop closed as in the crystal structure, and the other with the EC-II loop open. The MembStruk-predicted structure of bovine rhodopsin with the closed EC-II loop deviates from the crystal by 2.84 Å coordinate root mean-square (CRMS) in the transmembrane region main-chain atoms. The predicted three-dimensional structures for other GPCRs can be validated only by predicting binding sites and energies for various ligands. For such predictions we developed the HierDock first principles computational method. We validate HierDock by predicting the binding site of 11-cis-retinal in the crystal structure of bovine rhodopsin. Scanning the whole protein without using any prior knowledge of the binding site, we find that the best scoring conformation in rhodopsin is 1.1 Å CRMS from the crystal structure for the ligand atoms. This predicted conformation has the carbonyl O only 2.82 Å from the N of Lys-296. Making this Schiff base bond and minimizing leads to a final conformation only 0.62 Å CRMS from the crystal structure. We also used HierDock to predict the binding site of 11-cis-retinal in the MembStruk-predicted structure of bovine rhodopsin (closed loop). Scanning the whole protein structure leads to a structure in which the carbonyl O is only 2.85 Å from the N of Lys-296. Making this Schiff base bond and minimizing leads to a final conformation only 2.92 Å CRMS from the crystal structure. The good agreement of the ab initio-predicted protein structures and ligand binding site with experiment validates the use of the MembStruk and HierDock first principles' methods. Since these methods are generic and applicable to any GPCR, they should be useful in predicting the structures of other GPCRs and the binding site of ligands to these proteins. PMID:15041637

  13. Data-Driven High-Throughput Prediction of the 3D Structure of Small Molecules: Review and Progress

    PubMed Central

    Andronico, Alessio; Randall, Arlo; Benz, Ryan W.; Baldi, Pierre

    2011-01-01

    Accurate prediction of the 3D structure of small molecules is essential in order to understand their physical, chemical, and biological properties including how they interact with other molecules. Here we survey the field of high-throughput methods for 3D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system’s performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to the state-of-the-art, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD [99.6% organic, 94.6% metal-organic] whereas the widely used commercial method CORINA predicts structures for 68.5% [98.5% organic, 51.6% metal-organic]. On the common subset of molecules predicted by both methods COSMOS makes predictions with an average speed per molecule of 0.15s [0.10s organic, 0.21s metal-organic], and an average RMSD of 1.57Å [1.26Å organic, 1.90Å metal-organic], and CORINA makes predictions with an average speed per molecule of 0.13s [0.18s organic, 0.08s metal-organic], and an average RMSD of 1.60Å [1.13Å organic, 2.11Å metal-organic]. COSMOS is available through the ChemDB chemoinformatics web portal at: http://cdb.ics.uci.edu/. PMID:21417267

  14. Free energy minimization to predict RNA secondary structures and computational RNA design.

    PubMed

    Churkin, Alexander; Weinbrand, Lina; Barash, Danny

    2015-01-01

    Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.

  15. Multiobjective evolutionary algorithm with many tables for purely ab initio protein structure prediction.

    PubMed

    Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso

    2013-07-30

    This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.

  16. Analysis of free modeling predictions by RBO aleph in CASP11.

    PubMed

    Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver

    2016-09-01

    The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue-residue contact prediction by EPC-map and contact-guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. Proteins 2016; 84(Suppl 1):87-104. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  17. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  18. In Silico Analysis for the Study of Botulinum Toxin Structure

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2010-01-01

    Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.

  19. A simple extension to the CMASA method for the prediction of catalytic residues in the presence of single point mutations.

    PubMed

    Flores, David I; Sotelo-Mundo, Rogerio R; Brizuela, Carlos A

    2014-01-01

    The automatic identification of catalytic residues still remains an important challenge in structural bioinformatics. Sequence-based methods are good alternatives when the query shares a high percentage of identity with a well-annotated enzyme. However, when the homology is not apparent, which occurs with many structures from the structural genome initiative, structural information should be exploited. A local structural comparison is preferred to a global structural comparison when predicting functional residues. CMASA is a recently proposed method for predicting catalytic residues based on a local structure comparison. The method achieves high accuracy and a high value for the Matthews correlation coefficient. However, point substitutions or a lack of relevant data strongly affect the performance of the method. In the present study, we propose a simple extension to the CMASA method to overcome this difficulty. Extensive computational experiments are shown as proof of concept instances, as well as for a few real cases. The results show that the extension performs well when the catalytic site contains mutated residues or when some residues are missing. The proposed modification could correctly predict the catalytic residues of a mutant thymidylate synthase, 1EVF. It also successfully predicted the catalytic residues for 3HRC despite the lack of information for a relevant side chain atom in the PDB file.

  20. RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures

    PubMed Central

    Miao, Zhichao; Adamiak, Ryszard W.; Blanchet, Marc-Frédérick; Boniecki, Michal; Bujnicki, Janusz M.; Chen, Shi-Jie; Cheng, Clarence; Chojnowski, Grzegorz; Chou, Fang-Chieh; Cordero, Pablo; Cruz, José Almeida; Ferré-D'Amaré, Adrian R.; Das, Rhiju; Ding, Feng; Dokholyan, Nikolay V.; Dunin-Horkawicz, Stanislaw; Kladwang, Wipapat; Krokhotin, Andrey; Lach, Grzegorz; Magnus, Marcin; Major, François; Mann, Thomas H.; Masquida, Benoît; Matelska, Dorota; Meyer, Mélanie; Peselis, Alla; Popenda, Mariusz; Purzycka, Katarzyna J.; Serganov, Alexander; Stasiewicz, Juliusz; Szachniuk, Marta; Tandon, Arpit; Tian, Siqi; Wang, Jian; Xiao, Yi; Xu, Xiaojun; Zhang, Jinwei; Zhao, Peinan; Zok, Tomasz; Westhof, Eric

    2015-01-01

    This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5–3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson–Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/. PMID:25883046

  1. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    PubMed

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  2. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.

    PubMed

    Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin

    2016-06-07

    RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .

  3. Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study.

    PubMed

    Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

    2006-02-28

    Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.

  4. Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study

    PubMed Central

    Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

    2006-01-01

    Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978

  5. Blind test of physics-based prediction of protein structures.

    PubMed

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  6. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  7. Toward a structure determination method for biomineral-associated protein using combined solid- state NMR and computational structure prediction.

    PubMed

    Masica, David L; Ash, Jason T; Ndao, Moise; Drobny, Gary P; Gray, Jeffrey J

    2010-12-08

    Protein-biomineral interactions are paramount to materials production in biology, including the mineral phase of hard tissue. Unfortunately, the structure of biomineral-associated proteins cannot be determined by X-ray crystallography or solution nuclear magnetic resonance (NMR). Here we report a method for determining the structure of biomineral-associated proteins. The method combines solid-state NMR (ssNMR) and ssNMR-biased computational structure prediction. In addition, the algorithm is able to identify lattice geometries most compatible with ssNMR constraints, representing a quantitative, novel method for investigating crystal-face binding specificity. We use this method to determine most of the structure of human salivary statherin interacting with the mineral phase of tooth enamel. Computation and experiment converge on an ensemble of related structures and identify preferential binding at three crystal surfaces. The work represents a significant advance toward determining structure of biomineral-adsorbed protein using experimentally biased structure prediction. This method is generally applicable to proteins that can be chemically synthesized. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. Protein asparagine deamidation prediction based on structures with machine learning methods.

    PubMed

    Jia, Lei; Sun, Yaxiong

    2017-01-01

    Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

  9. An object programming based environment for protein secondary structure prediction.

    PubMed

    Giacomini, M; Ruggiero, C; Sacile, R

    1996-01-01

    The most frequently used methods for protein secondary structure prediction are empirical statistical methods and rule based methods. A consensus system based on object-oriented programming is presented, which integrates the two approaches with the aim of improving the prediction quality. This system uses an object-oriented knowledge representation based on the concepts of conformation, residue and protein, where the conformation class is the basis, the residue class derives from it and the protein class derives from the residue class. The system has been tested with satisfactory results on several proteins of the Brookhaven Protein Data Bank. Its results have been compared with the results of the most widely used prediction methods, and they show a higher prediction capability and greater stability. Moreover, the system itself provides an index of the reliability of its current prediction. This system can also be regarded as a basis structure for programs of this kind.

  10. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  11. Revisiting the blind tests in crystal structure prediction: accurate energy ranking of molecular crystals.

    PubMed

    Asmadi, Aldi; Neumann, Marcus A; Kendrick, John; Girard, Pascale; Perrin, Marc-Antoine; Leusen, Frank J J

    2009-12-24

    In the 2007 blind test of crystal structure prediction hosted by the Cambridge Crystallographic Data Centre (CCDC), a hybrid DFT/MM method correctly ranked each of the four experimental structures as having the lowest lattice energy of all the crystal structures predicted for each molecule. The work presented here further validates this hybrid method by optimizing the crystal structures (experimental and submitted) of the first three CCDC blind tests held in 1999, 2001, and 2004. Except for the crystal structures of compound IX, all structures were reminimized and ranked according to their lattice energies. The hybrid method computes the lattice energy of a crystal structure as the sum of the DFT total energy and a van der Waals (dispersion) energy correction. Considering all four blind tests, the crystal structure with the lowest lattice energy corresponds to the experimentally observed structure for 12 out of 14 molecules. Moreover, good geometrical agreement is observed between the structures determined by the hybrid method and those measured experimentally. In comparison with the correct submissions made by the blind test participants, all hybrid optimized crystal structures (apart from compound II) have the smallest calculated root mean squared deviations from the experimentally observed structures. It is predicted that a new polymorph of compound V exists under pressure.

  12. Protein 8-class secondary structure prediction using conditional neural fields.

    PubMed

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

    PubMed Central

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2015-01-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. PMID:26369671

  14. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  15. PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

    PubMed Central

    Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

    2008-01-01

    A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945

  16. Revealing how network structure affects accuracy of link prediction

    NASA Astrophysics Data System (ADS)

    Yang, Jin-Xuan; Zhang, Xiao-Dong

    2017-08-01

    Link prediction plays an important role in network reconstruction and network evolution. The network structure affects the accuracy of link prediction, which is an interesting problem. In this paper we use common neighbors and the Gini coefficient to reveal the relation between them, which can provide a good reference for the choice of a suitable link prediction algorithm according to the network structure. Moreover, the statistical analysis reveals correlation between the common neighbors index, Gini coefficient index and other indices to describe the network structure, such as Laplacian eigenvalues, clustering coefficient, degree heterogeneity, and assortativity of network. Furthermore, a new method to predict missing links is proposed. The experimental results show that the proposed algorithm yields better prediction accuracy and robustness to the network structure than existing currently used methods for a variety of real-world networks.

  17. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

    PubMed

    Rysavy, Steven J; Beck, David A C; Daggett, Valerie

    2014-11-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.

  18. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    PubMed Central

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  19. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

  20. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  1. Building a Better Fragment Library for De Novo Protein Structure Prediction

    PubMed Central

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  2. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  3. Building blocks for automated elucidation of metabolites: machine learning methods for NMR prediction.

    PubMed

    Kuhn, Stefan; Egert, Björn; Neumann, Steffen; Steinbeck, Christoph

    2008-09-25

    Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-Assisted Structure Elucidation (CASE) of biological metabolites will be an important tool to leverage this lack of knowledge. Indispensable for CASE are modules to predict spectra for hypothetical structures. This paper evaluates different statistical and machine learning methods to perform predictions of proton NMR spectra based on data from our open database NMRShiftDB. A mean absolute error of 0.18 ppm was achieved for the prediction of proton NMR shifts ranging from 0 to 11 ppm. Random forest, J48 decision tree and support vector machines achieved similar overall errors. HOSE codes being a notably simple method achieved a comparatively good result of 0.17 ppm mean absolute error. NMR prediction methods applied in the course of this work delivered precise predictions which can serve as a building block for Computer-Assisted Structure Elucidation for biological metabolites.

  4. Automated 3D structure composition for large RNAs

    PubMed Central

    Popenda, Mariusz; Szachniuk, Marta; Antczak, Maciej; Purzycka, Katarzyna J.; Lukasiak, Piotr; Bartol, Natalia; Blazewicz, Jacek; Adamiak, Ryszard W.

    2012-01-01

    Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues. PMID:22539264

  5. Experimental validation of finite element and boundary element methods for predicting structural vibration and radiated noise

    NASA Technical Reports Server (NTRS)

    Seybert, A. F.; Wu, T. W.; Wu, X. F.

    1994-01-01

    This research report is presented in three parts. In the first part, acoustical analyses were performed on modes of vibration of the housing of a transmission of a gear test rig developed by NASA. The modes of vibration of the transmission housing were measured using experimental modal analysis. The boundary element method (BEM) was used to calculate the sound pressure and sound intensity on the surface of the housing and the radiation efficiency of each mode. The radiation efficiency of each of the transmission housing modes was then compared to theoretical results for a finite baffled plate. In the second part, analytical and experimental validation of methods to predict structural vibration and radiated noise are presented. A rectangular box excited by a mechanical shaker was used as a vibrating structure. Combined finite element method (FEM) and boundary element method (BEM) models of the apparatus were used to predict the noise level radiated from the box. The FEM was used to predict the vibration, while the BEM was used to predict the sound intensity and total radiated sound power using surface vibration as the input data. Vibration predicted by the FEM model was validated by experimental modal analysis; noise predicted by the BEM was validated by measurements of sound intensity. Three types of results are presented for the total radiated sound power: sound power predicted by the BEM model using vibration data measured on the surface of the box; sound power predicted by the FEM/BEM model; and sound power measured by an acoustic intensity scan. In the third part, the structure used in part two was modified. A rib was attached to the top plate of the structure. The FEM and BEM were then used to predict structural vibration and radiated noise respectively. The predicted vibration and radiated noise were then validated through experimentation.

  6. Analysis of simple 2-D and 3-D metal structures subjected to fragment impact

    NASA Technical Reports Server (NTRS)

    Witmer, E. A.; Stagliano, T. R.; Spilker, R. L.; Rodal, J. J. A.

    1977-01-01

    Theoretical methods were developed for predicting the large-deflection elastic-plastic transient structural responses of metal containment or deflector (C/D) structures to cope with rotor burst fragment impact attack. For two-dimensional C/D structures both, finite element and finite difference analysis methods were employed to analyze structural response produced by either prescribed transient loads or fragment impact. For the latter category, two time-wise step-by-step analysis procedures were devised to predict the structural responses resulting from a succession of fragment impacts: the collision force method (CFM) which utilizes an approximate prediction of the force applied to the attacked structure during fragment impact, and the collision imparted velocity method (CIVM) in which the impact-induced velocity increment acquired by a region of the impacted structure near the impact point is computed. The merits and limitations of these approaches are discussed. For the analysis of 3-d responses of C/D structures, only the CIVM approach was investigated.

  7. Structure Prediction of the Second Extracellular Loop in G-Protein-Coupled Receptors

    PubMed Central

    Kmiecik, Sebastian; Jamroz, Michal; Kolinski, Michal

    2014-01-01

    G-protein-coupled receptors (GPCRs) play key roles in living organisms. Therefore, it is important to determine their functional structures. The second extracellular loop (ECL2) is a functionally important region of GPCRs, which poses significant challenge for computational structure prediction methods. In this work, we evaluated CABS, a well-established protein modeling tool for predicting ECL2 structure in 13 GPCRs. The ECL2s (with between 13 and 34 residues) are predicted in an environment of other extracellular loops being fully flexible and the transmembrane domain fixed in its x-ray conformation. The modeling procedure used theoretical predictions of ECL2 secondary structure and experimental constraints on disulfide bridges. Our approach yielded ensembles of low-energy conformers and the most populated conformers that contained models close to the available x-ray structures. The level of similarity between the predicted models and x-ray structures is comparable to that of other state-of-the-art computational methods. Our results extend other studies by including newly crystallized GPCRs. PMID:24896119

  8. Link Prediction in Evolving Networks Based on Popularity of Nodes.

    PubMed

    Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

    2017-08-02

    Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.

  9. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    PubMed

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  10. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.

    PubMed

    Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras

    2015-08-01

    Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.

  11. Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.

    PubMed

    Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam

    2015-06-22

    A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.

  12. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  13. Validation of finite element and boundary element methods for predicting structural vibration and radiated noise

    NASA Technical Reports Server (NTRS)

    Seybert, A. F.; Wu, X. F.; Oswald, Fred B.

    1992-01-01

    Analytical and experimental validation of methods to predict structural vibration and radiated noise are presented. A rectangular box excited by a mechanical shaker was used as a vibrating structure. Combined finite element method (FEM) and boundary element method (BEM) models of the apparatus were used to predict the noise radiated from the box. The FEM was used to predict the vibration, and the surface vibration was used as input to the BEM to predict the sound intensity and sound power. Vibration predicted by the FEM model was validated by experimental modal analysis. Noise predicted by the BEM was validated by sound intensity measurements. Three types of results are presented for the total radiated sound power: (1) sound power predicted by the BEM modeling using vibration data measured on the surface of the box; (2) sound power predicted by the FEM/BEM model; and (3) sound power measured by a sound intensity scan. The sound power predicted from the BEM model using measured vibration data yields an excellent prediction of radiated noise. The sound power predicted by the combined FEM/BEM model also gives a good prediction of radiated noise except for a shift of the natural frequencies that are due to limitations in the FEM model.

  14. On the importance of cotranscriptional RNA structure formation

    PubMed Central

    Lai, Daniel; Proctor, Jeff R.; Meyer, Irmtraud M.

    2013-01-01

    The expression of genes, both coding and noncoding, can be significantly influenced by RNA structural features of their corresponding transcripts. There is by now mounting experimental and some theoretical evidence that structure formation in vivo starts during transcription and that this cotranscriptional folding determines the functional RNA structural features that are being formed. Several decades of research in bioinformatics have resulted in a wide range of computational methods for predicting RNA secondary structures. Almost all state-of-the-art methods in terms of prediction accuracy, however, completely ignore the process of structure formation and focus exclusively on the final RNA structure. This review hopes to bridge this gap. We summarize the existing evidence for cotranscriptional folding and then review the different, currently used strategies for RNA secondary-structure prediction. Finally, we propose a range of ideas on how state-of-the-art methods could be potentially improved by explicitly capturing the process of cotranscriptional structure formation. PMID:24131802

  15. Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures

    PubMed Central

    2010-01-01

    Background β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. Results We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of β-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. Conclusions We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/. PMID:20673368

  16. Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures.

    PubMed

    Kountouris, Petros; Hirst, Jonathan D

    2010-07-31

    Beta-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. We have developed a novel method that predicts beta-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of beta-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of beta-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. We have created an accurate predictor of beta-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.

  17. A novel knowledge-based potential for RNA 3D structure evaluation

    NASA Astrophysics Data System (ADS)

    Yang, Yi; Gu, Qi; Zhang, Ben-Gong; Shi, Ya-Zhou; Shao, Zhi-Gang

    2018-03-01

    Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation. Project supported by the National Science Foundation of China (Grants Nos. 11605125, 11105054, 11274124, and 11401448).

  18. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

    PubMed Central

    Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi

    2018-01-01

    Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746

  19. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2016-09-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  20. A cross docking pipeline for improving pose prediction and virtual screening performance

    NASA Astrophysics Data System (ADS)

    Kumar, Ashutosh; Zhang, Kam Y. J.

    2018-01-01

    Pose prediction and virtual screening performance of a molecular docking method depend on the choice of protein structures used for docking. Multiple structures for a target protein are often used to take into account the receptor flexibility and problems associated with a single receptor structure. However, the use of multiple receptor structures is computationally expensive when docking a large library of small molecules. Here, we propose a new cross-docking pipeline suitable to dock a large library of molecules while taking advantage of multiple target protein structures. Our method involves the selection of a suitable receptor for each ligand in a screening library utilizing ligand 3D shape similarity with crystallographic ligands. We have prospectively evaluated our method in D3R Grand Challenge 2 and demonstrated that our cross-docking pipeline can achieve similar or better performance than using either single or multiple-receptor structures. Moreover, our method displayed not only decent pose prediction performance but also better virtual screening performance over several other methods.

  1. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  2. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  3. Complete fold annotation of the human proteome using a novel structural feature space.

    PubMed

    Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  4. Complete fold annotation of the human proteome using a novel structural feature space

    PubMed Central

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    2017-01-01

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174

  5. A high-throughput exploration of magnetic materials by using structure predicting methods

    NASA Astrophysics Data System (ADS)

    Arapan, S.; Nieves, P.; Cuesta-López, S.

    2018-02-01

    We study the capability of a structure predicting method based on genetic/evolutionary algorithm for a high-throughput exploration of magnetic materials. We use the USPEX and VASP codes to predict stable and generate low-energy meta-stable structures for a set of representative magnetic structures comprising intermetallic alloys, oxides, interstitial compounds, and systems containing rare-earths elements, and for both types of ferromagnetic and antiferromagnetic ordering. We have modified the interface between USPEX and VASP codes to improve the performance of structural optimization as well as to perform calculations in a high-throughput manner. We show that exploring the structure phase space with a structure predicting technique reveals large sets of low-energy metastable structures, which not only improve currently exiting databases, but also may provide understanding and solutions to stabilize and synthesize magnetic materials suitable for permanent magnet applications.

  6. Structure and Stability of Molecular Crystals with Many-Body Dispersion-Inclusive Density Functional Tight Binding.

    PubMed

    Mortazavi, Majid; Brandenburg, Jan Gerit; Maurer, Reinhard J; Tkatchenko, Alexandre

    2018-01-18

    Accurate prediction of structure and stability of molecular crystals is crucial in materials science and requires reliable modeling of long-range dispersion interactions. Semiempirical electronic structure methods are computationally more efficient than their ab initio counterparts, allowing structure sampling with significant speedups. We combine the Tkatchenko-Scheffler van der Waals method (TS) and the many-body dispersion method (MBD) with third-order density functional tight-binding (DFTB3) via a charge population-based method. We find an overall good performance for the X23 benchmark database of molecular crystals, despite an underestimation of crystal volume that can be traced to the DFTB parametrization. We achieve accurate lattice energy predictions with DFT+MBD energetics on top of vdW-inclusive DFTB3 structures, resulting in a speedup of up to 3000 times compared with a full DFT treatment. This suggests that vdW-inclusive DFTB3 can serve as a viable structural prescreening tool in crystal structure prediction.

  7. Prediction of pi-turns in proteins using PSI-BLAST profiles and secondary structure information.

    PubMed

    Wang, Yan; Xue, Zhi-Dong; Shi, Xiao-Hong; Xu, Jin

    2006-09-01

    Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/.

  8. Protein–DNA Interactions: The Story so Far and a New Method for Prediction

    DOE PAGES

    Jones, Susan; Thornton, Janet M.

    2003-01-01

    This review describes methods for the prediction of DNA binding function, and specifically summarizes a new method using 3D structural templates. The new method features the HTH motif that is found in approximately one-third of DNAbinding protein families. A library of 3D structural templates of HTH motifs was derived from proteins in the PDB. Templates were scanned against complete protein structures and the optimal superposition of a template on a structure calculated. Significance thresholds in terms of a minimum root mean squared deviation (rmsd) of an optimal superposition, and a minimum motif accessible surface area (ASA), have been calculated. Inmore » this way, it is possible to scan the template library against proteins of unknown function to make predictions about DNA-binding functionality.« less

  9. Hybrid experimental/analytical models of structural dynamics - Creation and use for predictions

    NASA Technical Reports Server (NTRS)

    Balmes, Etienne

    1993-01-01

    An original complete methodology for the construction of predictive models of damped structural vibrations is introduced. A consistent definition of normal and complex modes is given which leads to an original method to accurately identify non-proportionally damped normal mode models. A new method to create predictive hybrid experimental/analytical models of damped structures is introduced, and the ability of hybrid models to predict the response to system configuration changes is discussed. Finally a critical review of the overall methodology is made by application to the case of the MIT/SERC interferometer testbed.

  10. Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations

    PubMed Central

    Seeliger, Daniel; de Groot, Bert L.

    2010-01-01

    Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034

  11. Grain growth prediction based on data assimilation by implementing 4DVar on multi-phase-field model

    NASA Astrophysics Data System (ADS)

    Ito, Shin-ichi; Nagao, Hiromichi; Kasuya, Tadashi; Inoue, Junya

    2017-12-01

    We propose a method to predict grain growth based on data assimilation by using a four-dimensional variational method (4DVar). When implemented on a multi-phase-field model, the proposed method allows us to calculate the predicted grain structures and uncertainties in them that depend on the quality and quantity of the observational data. We confirm through numerical tests involving synthetic data that the proposed method correctly reproduces the true phase-field assumed in advance. Furthermore, it successfully quantifies uncertainties in the predicted grain structures, where such uncertainty quantifications provide valuable information to optimize the experimental design.

  12. Mining the protein data bank with CReF to predict approximate 3-D structures of polypeptides.

    PubMed

    Dorn, Márcio; de Souza, Osmar Norberto

    2010-01-01

    n this paper we describe CReF, a Central Residue Fragment-based method to predict approximate 3-D structures of polypeptides by mining the Protein Data Bank (PDB). The approximate predicted structures are good enough to be used as starting conformations in refinement procedures employing state-of-the-art molecular mechanics methods such as molecular dynamics simulations. CReF is very fast and we illustrate its efficacy in three case studies of polypeptides whose sizes vary from 34 to 70 amino acids. As indicated by the RMSD values, our initial results show that the predicted structures adopt the expected fold, similar to the experimental ones.

  13. Structure prediction of the second extracellular loop in G-protein-coupled receptors.

    PubMed

    Kmiecik, Sebastian; Jamroz, Michal; Kolinski, Michal

    2014-06-03

    G-protein-coupled receptors (GPCRs) play key roles in living organisms. Therefore, it is important to determine their functional structures. The second extracellular loop (ECL2) is a functionally important region of GPCRs, which poses significant challenge for computational structure prediction methods. In this work, we evaluated CABS, a well-established protein modeling tool for predicting ECL2 structure in 13 GPCRs. The ECL2s (with between 13 and 34 residues) are predicted in an environment of other extracellular loops being fully flexible and the transmembrane domain fixed in its x-ray conformation. The modeling procedure used theoretical predictions of ECL2 secondary structure and experimental constraints on disulfide bridges. Our approach yielded ensembles of low-energy conformers and the most populated conformers that contained models close to the available x-ray structures. The level of similarity between the predicted models and x-ray structures is comparable to that of other state-of-the-art computational methods. Our results extend other studies by including newly crystallized GPCRs. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    PubMed

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  15. RRCRank: a fusion method using rank strategy for residue-residue contact prediction.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian

    2017-09-02

    In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.

  16. Progressive Failure Analysis Methodology for Laminated Composite Structures

    NASA Technical Reports Server (NTRS)

    Sleight, David W.

    1999-01-01

    A progressive failure analysis method has been developed for predicting the failure of laminated composite structures under geometrically nonlinear deformations. The progressive failure analysis uses C(exp 1) shell elements based on classical lamination theory to calculate the in-plane stresses. Several failure criteria, including the maximum strain criterion, Hashin's criterion, and Christensen's criterion, are used to predict the failure mechanisms and several options are available to degrade the material properties after failures. The progressive failure analysis method is implemented in the COMET finite element analysis code and can predict the damage and response of laminated composite structures from initial loading to final failure. The different failure criteria and material degradation methods are compared and assessed by performing analyses of several laminated composite structures. Results from the progressive failure method indicate good correlation with the existing test data except in structural applications where interlaminar stresses are important which may cause failure mechanisms such as debonding or delaminations.

  17. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207

  18. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

    PubMed

    Ganesan, K; Parthasarathy, S

    2011-12-01

    Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .

  19. Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

    PubMed

    Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang

    2011-06-20

    Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

  20. Extracting physicochemical features to predict protein secondary structure.

    PubMed

    Huang, Yin-Fu; Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.

  1. Extracting Physicochemical Features to Predict Protein Secondary Structure

    PubMed Central

    Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688

  2. Ab initio RNA folding by discrete molecular dynamics: From structure prediction to folding mechanisms

    PubMed Central

    Ding, Feng; Sharma, Shantanu; Chalasani, Poornima; Demidov, Vadim V.; Broude, Natalia E.; Dokholyan, Nikolay V.

    2008-01-01

    RNA molecules with novel functions have revived interest in the accurate prediction of RNA three-dimensional (3D) structure and folding dynamics. However, existing methods are inefficient in automated 3D structure prediction. Here, we report a robust computational approach for rapid folding of RNA molecules. We develop a simplified RNA model for discrete molecular dynamics (DMD) simulations, incorporating base-pairing and base-stacking interactions. We demonstrate correct folding of 150 structurally diverse RNA sequences. The majority of DMD-predicted 3D structures have <4 Å deviations from experimental structures. The secondary structures corresponding to the predicted 3D structures consist of 94% native base-pair interactions. Folding thermodynamics and kinetics of tRNAPhe, pseudoknots, and mRNA fragments in DMD simulations are in agreement with previous experimental findings. Folding of RNA molecules features transient, non-native conformations, suggesting non-hierarchical RNA folding. Our method allows rapid conformational sampling of RNA folding, with computational time increasing linearly with RNA length. We envision this approach as a promising tool for RNA structural and functional analyses. PMID:18456842

  3. Predicting nucleic acid binding interfaces from structural models of proteins

    PubMed Central

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2011-01-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767

  4. LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleavage sites using loop/bulge length.

    PubMed

    Bao, Yu; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-11-25

    Dicer is necessary for the process of mature microRNA (miRNA) formation because the Dicer enzyme cleaves pre-miRNA correctly to generate miRNA with correct seed regions. Nonetheless, the mechanism underlying the selection of a Dicer cleavage site is still not fully understood. To date, several studies have been conducted to solve this problem, for example, a recent discovery indicates that the loop/bulge structure plays a central role in the selection of Dicer cleavage sites. In accordance with this breakthrough, a support vector machine (SVM)-based method called PHDCleav was developed to predict Dicer cleavage sites which outperforms other methods based on random forest and naive Bayes. PHDCleav, however, tests only whether a position in the shift window belongs to a loop/bulge structure. In this paper, we used the length of loop/bulge structures (in addition to their presence or absence) to develop an improved method, LBSizeCleav, for predicting Dicer cleavage sites. To evaluate our method, we used 810 empirically validated sequences of human pre-miRNAs and performed fivefold cross-validation. In both 5p and 3p arms of pre-miRNAs, LBSizeCleav showed greater prediction accuracy than PHDCleav did. This result suggests that the length of loop/bulge structures is useful for prediction of Dicer cleavage sites. We developed a novel algorithm for feature space mapping based on the length of a loop/bulge for predicting Dicer cleavage sites. The better performance of our method indicates the usefulness of the length of loop/bulge structures for such predictions.

  5. Relative Packing Groups in Template-Based Structure Prediction: Cooperative Effects of True Positive Constraints

    PubMed Central

    Day, Ryan; Qu, Xiaotao; Swanson, Rosemarie; Bohannan, Zach; Bliss, Robert

    2011-01-01

    Abstract Most current template-based structure prediction methods concentrate on finding the correct backbone conformation and then packing sidechains within that backbone. Our packing-based method derives distance constraints from conserved relative packing groups (RPGs). In our refinement approach, the RPGs provide a level of resolution that restrains global topology while allowing conformational sampling. In this study, we test our template-based structure prediction method using 51 prediction units from CASP7 experiments. RPG-based constraints are able to substantially improve approximately two-thirds of starting templates. Upon deeper investigation, we find that true positive spatial constraints, especially those non-local in sequence, derived from the RPGs were important to building nearer native models. Surprisingly, the fraction of incorrect or false positive constraints does not strongly influence the quality of the final candidate. This result indicates that our RPG-based true positive constraints sample the self-consistent, cooperative interactions of the native structure. The lack of such reinforcing cooperativity explains the weaker effect of false positive constraints. Generally, these findings are encouraging indications that RPGs will improve template-based structure prediction. PMID:21210729

  6. Benchmark data sets for structure-based computational target prediction.

    PubMed

    Schomburg, Karen T; Rarey, Matthias

    2014-08-25

    Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.

  7. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment

    PubMed Central

    2014-01-01

    Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387

  8. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment.

    PubMed

    Cao, Renzhi; Wang, Zheng; Cheng, Jianlin

    2014-04-15

    Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.

  9. Construction of ontology augmented networks for protein complex prediction.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  10. Light-frame wall and floor systems : analysis and performance

    Treesearch

    G. Sherwood; R. C. Moody

    1989-01-01

    This report describes methods of predicting the performance of light-frame wood structures with emphasis on floor and wall systems. Methods of predicting structural performance, fire safety, and environmental concerns including thermal, moisture, and acoustic performance are addressed in the three major sections.

  11. Analysis of Physicochemical and Structural Properties Determining HIV-1 Coreceptor Usage

    PubMed Central

    Bozek, Katarzyna; Lengauer, Thomas; Sierra, Saleta; Kaiser, Rolf; Domingues, Francisco S.

    2013-01-01

    The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived ‘bulk’ sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/. PMID:23555214

  12. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  13. A method of predicting the energy-absorption capability of composite subfloor beams

    NASA Technical Reports Server (NTRS)

    Farley, Gary L.

    1987-01-01

    A simple method of predicting the energy-absorption capability of composite subfloor beam structure was developed. The method is based upon the weighted sum of the energy-absorption capability of constituent elements of a subfloor beam. An empirical data base of energy absorption results from circular and square cross section tube specimens were used in the prediction capability. The procedure is applicable to a wide range of subfloor beam structure. The procedure was demonstrated on three subfloor beam concepts. Agreement between test and prediction was within seven percent for all three cases.

  14. A protein block based fold recognition method for the annotation of twilight zone sequences.

    PubMed

    Suresh, V; Ganesan, K; Parthasarathy, S

    2013-03-01

    The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.

  15. Accelerated Test Method for Corrosion Protective Coatings Project

    NASA Technical Reports Server (NTRS)

    Falker, John; Zeitlin, Nancy; Calle, Luz

    2015-01-01

    This project seeks to develop a new accelerated corrosion test method that predicts the long-term corrosion protection performance of spaceport structure coatings as accurately and reliably as current long-term atmospheric exposure tests. This new accelerated test method will shorten the time needed to evaluate the corrosion protection performance of coatings for NASA's critical ground support structures. Lifetime prediction for spaceport structure coatings has a 5-year qualification cycle using atmospheric exposure. Current accelerated corrosion tests often provide false positives and negatives for coating performance, do not correlate to atmospheric corrosion exposure results, and do not correlate with atmospheric exposure timescales for lifetime prediction.

  16. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

    PubMed

    Roche, Daniel B; Buenavista, Maria T; Tetchner, Stuart J; McGuffin, Liam J

    2011-07-01

    The IntFOLD server is a novel independent server that integrates several cutting edge methods for the prediction of structure and function from sequence. Our guiding principles behind the server development were as follows: (i) to provide a simple unified resource that makes our prediction software accessible to all and (ii) to produce integrated output for predictions that can be easily interpreted. The output for predictions is presented as a simple table that summarizes all results graphically via plots and annotated 3D models. The raw machine readable data files for each set of predictions are also provided for developers, which comply with the Critical Assessment of Methods for Protein Structure Prediction (CASP) data standards. The server comprises an integrated suite of five novel methods: nFOLD4, for tertiary structure prediction; ModFOLD 3.0, for model quality assessment; DISOclust 2.0, for disorder prediction; DomFOLD 2.0 for domain prediction; and FunFOLD 1.0, for ligand binding site prediction. Predictions from the IntFOLD server were found to be competitive in several categories in the recent CASP9 experiment. The IntFOLD server is available at the following web site: http://www.reading.ac.uk/bioinf/IntFOLD/.

  17. Transmembrane helix prediction: a comparative evaluation and analysis.

    PubMed

    Cuthbertson, Jonathan M; Doyle, Declan A; Sansom, Mark S P

    2005-06-01

    The prediction of transmembrane (TM) helices plays an important role in the study of membrane proteins, given the relatively small number (approximately 0.5% of the PDB) of high-resolution structures for such proteins. We used two datasets (one redundant and one non-redundant) of high-resolution structures of membrane proteins to evaluate and analyse TM helix prediction. The redundant (non-redundant) dataset contains structure of 434 (268) TM helices, from 112 (73) polypeptide chains. Of the 434 helices in the dataset, 20 may be classified as 'half-TM' as they are too short to span a lipid bilayer. We compared 13 TM helix prediction methods, evaluating each method using per segment, per residue and termini scores. Four methods consistently performed well: SPLIT4, TMHMM2, HMMTOP2 and TMAP. However, even the best methods were in error by, on average, about two turns of helix at the TM helix termini. The best and worst case predictions for individual proteins were analysed. In particular, the performance of the various methods and of a consensus prediction method, were compared for a number of proteins (e.g. SecY, ClC, KvAP) containing half-TM helices. The difficulties of predicting half-TM helices suggests that current prediction methods successfully embody the two-state model of membrane protein folding, but do not accommodate a third stage in which, e.g., short helices and re-entrant loops fold within a bundle of stable TM helices.

  18. Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

    PubMed

    Abbass, Jad; Nebel, Jean-Christophe

    2017-01-01

    Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. Large-scale structure prediction by improved contact predictions and model quality assessment.

    PubMed

    Michel, Mirco; Menéndez Hurtado, David; Uziela, Karolis; Elofsson, Arne

    2017-07-15

    Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/ . All programs used here are freely available. arne@bioinfo.se. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Automatic prediction of protein domains from sequence information using a hybrid learning system.

    PubMed

    Nagarajan, Niranjan; Yona, Golan

    2004-06-12

    We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/

  1. Crystal-structure prediction via the Floppy-Box Monte Carlo algorithm: Method and application to hard (non)convex particles

    NASA Astrophysics Data System (ADS)

    de Graaf, Joost; Filion, Laura; Marechal, Matthieu; van Roij, René; Dijkstra, Marjolein

    2012-12-01

    In this paper, we describe the way to set up the floppy-box Monte Carlo (FBMC) method [L. Filion, M. Marechal, B. van Oorschot, D. Pelt, F. Smallenburg, and M. Dijkstra, Phys. Rev. Lett. 103, 188302 (2009), 10.1103/PhysRevLett.103.188302] to predict crystal-structure candidates for colloidal particles. The algorithm is explained in detail to ensure that it can be straightforwardly implemented on the basis of this text. The handling of hard-particle interactions in the FBMC algorithm is given special attention, as (soft) short-range and semi-long-range interactions can be treated in an analogous way. We also discuss two types of algorithms for checking for overlaps between polyhedra, the method of separating axes and a triangular-tessellation based technique. These can be combined with the FBMC method to enable crystal-structure prediction for systems composed of highly shape-anisotropic particles. Moreover, we present the results for the dense crystal structures predicted using the FBMC method for 159 (non)convex faceted particles, on which the findings in [J. de Graaf, R. van Roij, and M. Dijkstra, Phys. Rev. Lett. 107, 155501 (2011), 10.1103/PhysRevLett.107.155501] were based. Finally, we comment on the process of crystal-structure prediction itself and the choices that can be made in these simulations.

  2. Accurate Prediction of Contact Numbers for Multi-Spanning Helical Membrane Proteins

    PubMed Central

    Li, Bian; Mendenhall, Jeffrey; Nguyen, Elizabeth Dong; Weiner, Brian E.; Fischer, Axel W.; Meiler, Jens

    2017-01-01

    Prediction of the three-dimensional (3D) structures of proteins by computational methods is acknowledged as an unsolved problem. Accurate prediction of important structural characteristics such as contact number is expected to accelerate the otherwise slow progress being made in the prediction of 3D structure of proteins. Here, we present a dropout neural network-based method, TMH-Expo, for predicting the contact number of transmembrane helix (TMH) residues from sequence. Neuronal dropout is a strategy where certain neurons of the network are excluded from back-propagation to prevent co-adaptation of hidden-layer neurons. By using neuronal dropout, overfitting was significantly reduced and performance was noticeably improved. For multi-spanning helical membrane proteins, TMH-Expo achieved a remarkable Pearson correlation coefficient of 0.69 between predicted and experimental values and a mean absolute error of only 1.68. In addition, among those membrane protein–membrane protein interface residues, 76.8% were correctly predicted. Mapping of predicted contact numbers onto structures indicates that contact numbers predicted by TMH-Expo reflect the exposure patterns of TMHs and reveal membrane protein–membrane protein interfaces, reinforcing the potential of predicted contact numbers to be used as restraints for 3D structure prediction and protein–protein docking. TMH-Expo can be accessed via a Web server at www.meilerlab.org. PMID:26804342

  3. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    PubMed

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  4. Crystal structure prediction supported by incomplete experimental data

    NASA Astrophysics Data System (ADS)

    Tsujimoto, Naoto; Adachi, Daiki; Akashi, Ryosuke; Todo, Synge; Tsuneyuki, Shinji

    2018-05-01

    We propose an efficient theoretical scheme for structure prediction on the basis of the idea of combining methods, which optimize theoretical calculation and experimental data simultaneously. In this scheme, we formulate a cost function based on a weighted sum of interatomic potential energies and a penalty function which is defined with partial experimental data totally insufficient for conventional structure analysis. In particular, we define the cost function using "crystallinity" formulated with only peak positions within the small range of the x-ray-diffraction pattern. We apply this method to well-known polymorphs of SiO2 and C with up to 108 atoms in the simulation cell and show that it reproduces the correct structures efficiently with very limited information of diffraction peaks. This scheme opens a new avenue for determining and predicting structures that are difficult to determine by conventional methods.

  5. Predicting beta-turns in proteins using support vector machines with fractional polynomials

    PubMed Central

    2013-01-01

    Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438

  6. Predicting beta-turns in proteins using support vector machines with fractional polynomials.

    PubMed

    Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng

    2013-11-07

    β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

  7. A Method to Predict the Structure and Stability of RNA/RNA Complexes.

    PubMed

    Xu, Xiaojun; Chen, Shi-Jie

    2016-01-01

    RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.

  8. Lessons learned in induced fit docking and metadynamics in the Drug Design Data Resource Grand Challenge 2

    NASA Astrophysics Data System (ADS)

    Baumgartner, Matthew P.; Evans, David A.

    2018-01-01

    Two of the major ongoing challenges in computational drug discovery are predicting the binding pose and affinity of a compound to a protein. The Drug Design Data Resource Grand Challenge 2 was developed to address these problems and to drive development of new methods. The challenge provided the 2D structures of compounds for which the organizers help blinded data in the form of 35 X-ray crystal structures and 102 binding affinity measurements and challenged participants to predict the binding pose and affinity of the compounds. We tested a number of pose prediction methods as part of the challenge; we found that docking methods that incorporate protein flexibility (Induced Fit Docking) outperformed methods that treated the protein as rigid. We also found that using binding pose metadynamics, a molecular dynamics based method, to score docked poses provided the best predictions of our methods with an average RMSD of 2.01 Å. We tested both structure-based (e.g. docking) and ligand-based methods (e.g. QSAR) in the affinity prediction portion of the competition. We found that our structure-based methods based on docking with Smina (Spearman ρ = 0.614), performed slightly better than our ligand-based methods (ρ = 0.543), and had equivalent performance with the other top methods in the competition. Despite the overall good performance of our methods in comparison to other participants in the challenge, there exists significant room for improvement especially in cases such as these where protein flexibility plays such a large role.

  9. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  10. Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods

    NASA Astrophysics Data System (ADS)

    Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.

    2017-12-01

    Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.

  11. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  12. DBH Prediction Using Allometry Described by Bivariate Copula Distribution

    NASA Astrophysics Data System (ADS)

    Xu, Q.; Hou, Z.; Li, B.; Greenberg, J. A.

    2017-12-01

    Forest biomass mapping based on single tree detection from the airborne laser scanning (ALS) usually depends on an allometric equation that relates diameter at breast height (DBH) with per-tree aboveground biomass. The incapability of the ALS technology in directly measuring DBH leads to the need to predict DBH with other ALS-measured tree-level structural parameters. A copula-based method is proposed in the study to predict DBH with the ALS-measured tree height and crown diameter using a dataset measured in the Lassen National Forest in California. Instead of exploring an explicit mathematical equation that explains the underlying relationship between DBH and other structural parameters, the copula-based prediction method utilizes the dependency between cumulative distributions of these variables, and solves the DBH based on an assumption that for a single tree, the cumulative probability of each structural parameter is identical. Results show that compared with the bench-marking least-square linear regression and the k-MSN imputation, the copula-based method obtains better accuracy in the DBH for the Lassen National Forest. To assess the generalization of the proposed method, prediction uncertainty is quantified using bootstrapping techniques that examine the variability of the RMSE of the predicted DBH. We find that the copula distribution is reliable in describing the allometric relationship between tree-level structural parameters, and it contributes to the reduction of prediction uncertainty.

  13. Predicting helix orientation for coiled-coil dimers

    PubMed Central

    Apgar, James R.; Gutwin, Karl N.; Keating, Amy E.

    2008-01-01

    The alpha-helical coiled coil is a structurally simple protein oligomerization or interaction motif consisting of two or more alpha helices twisted into a supercoiled bundle. Coiled coils can differ in their stoichiometry, helix orientation and axial alignment. Because of the near degeneracy of many of these variants, coiled coils pose a challenge to fold recognition methods for structure prediction. Whereas distinctions between some protein folds can be discriminated on the basis of hydrophobic/polar patterning or secondary structure propensities, the sequence differences that encode important details of coiled-coil structure can be subtle. This is emblematic of a larger problem in the field of protein structure and interaction prediction: that of establishing specificity between closely similar structures. We tested the behavior of different computational models on the problem of recognizing the correct orientation - parallel vs. antiparallel - of pairs of alpha helices that can form a dimeric coiled coil. For each of 131 examples of known structure, we constructed a large number of both parallel and antiparallel structural models and used these to asses the ability of five energy functions to recognize the correct fold. We also developed and tested three sequenced-based approaches that make use of varying degrees of implicit structural information. The best structural methods performed similarly to the best sequence methods, correctly categorizing ∼81% of dimers. Steric compatibility with the fold was important for some coiled coils we investigated. For many examples, the correct orientation was determined by smaller energy differences between parallel and antiparallel structures distributed over many residues and energy components. Prediction methods that used structure but incorporated varying approximations and assumptions showed quite different behaviors when used to investigate energetic contributions to orientation preference. Sequence based methods were sensitive to the choice of residue-pair interactions scored. PMID:18506779

  14. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  15. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  16. Recent developments in structural proteomics for protein structure determination.

    PubMed

    Liu, Hsuan-Liang; Hsu, Jyh-Ping

    2005-05-01

    The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.

  17. Predicting nucleic acid binding interfaces from structural models of proteins.

    PubMed

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  18. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  19. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529

  20. Firefly Algorithm for Structural Search.

    PubMed

    Avendaño-Franco, Guillermo; Romero, Aldo H

    2016-07-12

    The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers.

  1. Computational Methods in Drug Discovery

    PubMed Central

    Sliwoski, Gregory; Kothiwale, Sandeepkumar; Meiler, Jens

    2014-01-01

    Computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based methods. Structure-based methods are in principle analogous to high-throughput screening in that both target and ligand structure information is imperative. Structure-based approaches include ligand docking, pharmacophore, and ligand design methods. The article discusses theory behind the most important methods and recent successful applications. Ligand-based methods use only ligand information for predicting activity depending on its similarity/dissimilarity to previously known active ligands. We review widely used ligand-based methods such as ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships. In addition, important tools such as target/ligand data bases, homology modeling, ligand fingerprint methods, etc., necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign are discussed. Finally, computational methods for toxicity prediction and optimization for favorable physiologic properties are discussed with successful examples from literature. PMID:24381236

  2. Protein model quality assessment prediction by combining fragment comparisons and a consensus Cα contact potential

    PubMed Central

    Zhou, Hongyi; Skolnick, Jeffrey

    2009-01-01

    In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus Cα contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models. PMID:18004783

  3. Prediction of β-turns in proteins from multiple alignment using neural network

    PubMed Central

    Kaur, Harpreet; Raghava, Gajendra Pal Singh

    2003-01-01

    A neural network-based method has been developed for the prediction of β-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST–generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Qpred, Qobs, and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published β-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach. PMID:12592033

  4. Planning, creating and documenting a NASTRAN finite element model of a modern helicopter

    NASA Technical Reports Server (NTRS)

    Gabal, R.; Reed, D.; Ricks, R.; Kesack, W.

    1985-01-01

    Mathematical models based on the finite element method of structural analysis as embodied in the NASTRAN computer code are widely used by the helicopter industry to calculate static internal loads and vibration of airframe structure. The internal loads are routinely used for sizing structural members. The vibration predictions are not yet relied on during design. NASA's Langley Research Center sponsored a program to conduct an application of the finite element method with emphasis on predicting structural vibration. The Army/Boeing CH-47D helicopter was used as the modeling subject. The objective was to engender the needed trust in vibration predictions using these models and establish a body of modeling guides which would enable confident future prediction of airframe vibration as part of the regular design process.

  5. RNA-SSPT: RNA Secondary Structure Prediction Tools.

    PubMed

    Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

    2013-01-01

    The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.

  6. RNA-SSPT: RNA Secondary Structure Prediction Tools

    PubMed Central

    Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

    2013-01-01

    The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes. PMID:24250115

  7. Predicting Welding Distortion in a Panel Structure with Longitudinal Stiffeners Using Inherent Deformations Obtained by Inverse Analysis Method

    PubMed Central

    Liang, Wei; Murakawa, Hidekazu

    2014-01-01

    Welding-induced deformation not only negatively affects dimension accuracy but also degrades the performance of product. If welding deformation can be accurately predicted beforehand, the predictions will be helpful for finding effective methods to improve manufacturing accuracy. Till now, there are two kinds of finite element method (FEM) which can be used to simulate welding deformation. One is the thermal elastic plastic FEM and the other is elastic FEM based on inherent strain theory. The former only can be used to calculate welding deformation for small or medium scale welded structures due to the limitation of computing speed. On the other hand, the latter is an effective method to estimate the total welding distortion for large and complex welded structures even though it neglects the detailed welding process. When the elastic FEM is used to calculate the welding-induced deformation for a large structure, the inherent deformations in each typical joint should be obtained beforehand. In this paper, a new method based on inverse analysis was proposed to obtain the inherent deformations for weld joints. Through introducing the inherent deformations obtained by the proposed method into the elastic FEM based on inherent strain theory, we predicted the welding deformation of a panel structure with two longitudinal stiffeners. In addition, experiments were carried out to verify the simulation results. PMID:25276856

  8. Predicting welding distortion in a panel structure with longitudinal stiffeners using inherent deformations obtained by inverse analysis method.

    PubMed

    Liang, Wei; Murakawa, Hidekazu

    2014-01-01

    Welding-induced deformation not only negatively affects dimension accuracy but also degrades the performance of product. If welding deformation can be accurately predicted beforehand, the predictions will be helpful for finding effective methods to improve manufacturing accuracy. Till now, there are two kinds of finite element method (FEM) which can be used to simulate welding deformation. One is the thermal elastic plastic FEM and the other is elastic FEM based on inherent strain theory. The former only can be used to calculate welding deformation for small or medium scale welded structures due to the limitation of computing speed. On the other hand, the latter is an effective method to estimate the total welding distortion for large and complex welded structures even though it neglects the detailed welding process. When the elastic FEM is used to calculate the welding-induced deformation for a large structure, the inherent deformations in each typical joint should be obtained beforehand. In this paper, a new method based on inverse analysis was proposed to obtain the inherent deformations for weld joints. Through introducing the inherent deformations obtained by the proposed method into the elastic FEM based on inherent strain theory, we predicted the welding deformation of a panel structure with two longitudinal stiffeners. In addition, experiments were carried out to verify the simulation results.

  9. Tertiary structure prediction and identification of druggable pocket in the cancer biomarker – Osteopontin-c

    PubMed Central

    2014-01-01

    Background Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research. Methods Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing. Results Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. Conclusions The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket. PMID:24401206

  10. A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine.

    PubMed

    Jain, Dharm Skandh; Gupte, Sanket Rajan; Aduri, Raviprasad

    2018-06-22

    RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.

  11. Probabilistic Structural Analysis Methods (PSAM) for select space propulsion system structural components

    NASA Technical Reports Server (NTRS)

    Cruse, T. A.

    1987-01-01

    The objective is the development of several modular structural analysis packages capable of predicting the probabilistic response distribution for key structural variables such as maximum stress, natural frequencies, transient response, etc. The structural analysis packages are to include stochastic modeling of loads, material properties, geometry (tolerances), and boundary conditions. The solution is to be in terms of the cumulative probability of exceedance distribution (CDF) and confidence bounds. Two methods of probability modeling are to be included as well as three types of structural models - probabilistic finite-element method (PFEM); probabilistic approximate analysis methods (PAAM); and probabilistic boundary element methods (PBEM). The purpose in doing probabilistic structural analysis is to provide the designer with a more realistic ability to assess the importance of uncertainty in the response of a high performance structure. Probabilistic Structural Analysis Method (PSAM) tools will estimate structural safety and reliability, while providing the engineer with information on the confidence that should be given to the predicted behavior. Perhaps most critically, the PSAM results will directly provide information on the sensitivity of the design response to those variables which are seen to be uncertain.

  12. Probabilistic Structural Analysis Methods for select space propulsion system structural components (PSAM)

    NASA Technical Reports Server (NTRS)

    Cruse, T. A.; Burnside, O. H.; Wu, Y.-T.; Polch, E. Z.; Dias, J. B.

    1988-01-01

    The objective is the development of several modular structural analysis packages capable of predicting the probabilistic response distribution for key structural variables such as maximum stress, natural frequencies, transient response, etc. The structural analysis packages are to include stochastic modeling of loads, material properties, geometry (tolerances), and boundary conditions. The solution is to be in terms of the cumulative probability of exceedance distribution (CDF) and confidence bounds. Two methods of probability modeling are to be included as well as three types of structural models - probabilistic finite-element method (PFEM); probabilistic approximate analysis methods (PAAM); and probabilistic boundary element methods (PBEM). The purpose in doing probabilistic structural analysis is to provide the designer with a more realistic ability to assess the importance of uncertainty in the response of a high performance structure. Probabilistic Structural Analysis Method (PSAM) tools will estimate structural safety and reliability, while providing the engineer with information on the confidence that should be given to the predicted behavior. Perhaps most critically, the PSAM results will directly provide information on the sensitivity of the design response to those variables which are seen to be uncertain.

  13. Towards Long-Range RNA Structure Prediction in Eukaryotic Genes.

    PubMed

    Pervouchine, Dmitri D

    2018-06-15

    The ability to form an intramolecular structure plays a fundamental role in eukaryotic RNA biogenesis. Proximate regions in the primary transcripts fold into a local secondary structure, which is then hierarchically assembled into a tertiary structure that is stabilized by RNA-binding proteins and long-range intramolecular base pairings. While the local RNA structure can be predicted reasonably well for short sequences, long-range structure at the scale of eukaryotic genes remains problematic from the computational standpoint. The aim of this review is to list functional examples of long-range RNA structures, to summarize current comparative methods of structure prediction, and to highlight their advances and limitations in the context of long-range RNA structures. Most comparative methods implement the “first-align-then-fold” principle, i.e., they operate on multiple sequence alignments, while functional RNA structures often reside in non-conserved parts of the primary transcripts. The opposite “first-fold-then-align” approach is currently explored to a much lesser extent. Developing novel methods in both directions will improve the performance of comparative RNA structure analysis and help discover novel long-range structures, their higher-order organization, and RNA⁻RNA interactions across the transcriptome.

  14. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  15. Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.

    PubMed

    Legendre, Audrey; Angel, Eric; Tahi, Fariza

    2018-01-15

    RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure. We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic bi-objective integer programming algorithm allowing to return optimal and sub-optimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F 1 -score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F 1 -scores are always higher than 70% for any number of solutions returned. The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several sub-optimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better mono-criterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multi-objective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univ-evry.fr .

  16. Theoretical prediction of welding distortion in large and complex structures

    NASA Astrophysics Data System (ADS)

    Deng, De-An

    2010-06-01

    Welding technology is widely used to assemble large thin plate structures such as ships, automobiles, and passenger trains because of its high productivity. However, it is impossible to avoid welding-induced distortion during the assembly process. Welding distortion not only reduces the fabrication accuracy of a weldment, but also decreases the productivity due to correction work. If welding distortion can be predicted using a practical method beforehand, the prediction will be useful for taking appropriate measures to control the dimensional accuracy to an acceptable limit. In this study, a two-step computational approach, which is a combination of a thermoelastic-plastic finite element method (FEM) and an elastic finite element with consideration for large deformation, is developed to estimate welding distortion for large and complex welded structures. Welding distortions in several representative large complex structures, which are often used in shipbuilding, are simulated using the proposed method. By comparing the predictions and the measurements, the effectiveness of the two-step computational approach is verified.

  17. Progressive damage, fracture predictions and post mortem correlations for fiber composites

    NASA Technical Reports Server (NTRS)

    1985-01-01

    Lewis Research Center is involved in the development of computational mechanics methods for predicting the structural behavior and response of composite structures. In conjunction with the analytical methods development, experimental programs including post failure examination are conducted to study various factors affecting composite fracture such as laminate thickness effects, ply configuration, and notch sensitivity. Results indicate that the analytical capabilities incorporated in the CODSTRAN computer code are effective in predicting the progressive damage and fracture of composite structures. In addition, the results being generated are establishing a data base which will aid in the characterization of composite fracture.

  18. Uncertainty aggregation and reduction in structure-material performance prediction

    NASA Astrophysics Data System (ADS)

    Hu, Zhen; Mahadevan, Sankaran; Ao, Dan

    2018-02-01

    An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.

  19. RNA 3D Structure Modeling by Combination of Template-Based Method ModeRNA, Template-Free Folding with SimRNA, and Refinement with QRNAS.

    PubMed

    Piatkowski, Pawel; Kasprzak, Joanna M; Kumar, Deepak; Magnus, Marcin; Chojnowski, Grzegorz; Bujnicki, Janusz M

    2016-01-01

    RNA encompasses an essential part of all known forms of life. The functions of many RNA molecules are dependent on their ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. To address this problem, computational structure prediction methods were developed that either utilize information derived from known structures of other RNA molecules (by way of template-based modeling) or attempt to simulate the physical process of RNA structure formation (by way of template-free modeling). All computational methods suffer from various limitations that make theoretical models less reliable than high-resolution experimentally determined structures. This chapter provides a protocol for computational modeling of RNA 3D structure that overcomes major limitations by combining two complementary approaches: template-based modeling that is capable of predicting global architectures based on similarity to other molecules but often fails to predict local unique features, and template-free modeling that can predict the local folding, but is limited to modeling the structure of relatively small molecules. Here, we combine the use of a template-based method ModeRNA with a template-free method SimRNA. ModeRNA requires a sequence alignment of the target RNA sequence to be modeled with a template of the known structure; it generates a model that predicts the structure of a conserved core and provides a starting point for modeling of variable regions. SimRNA can be used to fold small RNAs (<80 nt) without any additional structural information, and to refold parts of models for larger RNAs that have a correctly modeled core. ModeRNA can be either downloaded, compiled and run locally or run through a web interface at http://genesilico.pl/modernaserver/ . SimRNA is currently available to download for local use as a precompiled software package at http://genesilico.pl/software/stand-alone/simrna and as a web server at http://genesilico.pl/SimRNAweb . For model optimization we use QRNAS, available at http://genesilico.pl/qrnas .

  20. Information-theoretic indices usage for the prediction and calculation of octanol-water partition coefficient.

    PubMed

    Persona, Marek; Kutarov, Vladimir V; Kats, Boris M; Persona, Andrzej; Marczewska, Barbara

    2007-01-01

    The paper describes the new prediction method of octanol-water partition coefficient, which is based on molecular graph theory. The results obtained using the new method are well correlated with experimental values. These results were compared with the ones obtained by use of ten other structure correlated methods. The comparison shows that graph theory can be very useful in structure correlation research.

  1. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    PubMed

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  2. RNAstructure: software for RNA secondary structure prediction and analysis.

    PubMed

    Reuter, Jessica S; Mathews, David H

    2010-03-15

    To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.

  3. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    NASA Astrophysics Data System (ADS)

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-02-01

    For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  4. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    PubMed Central

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-01-01

    During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future. PMID:29515993

  5. Intermolecular shielding contributions studied by modeling the 13C chemical-shift tensors of organic single crystals with plane waves

    PubMed Central

    Johnston, Jessica C.; Iuliucci, Robbie J.; Facelli, Julio C.; Fitzgerald, George; Mueller, Karl T.

    2009-01-01

    In order to predict accurately the chemical shift of NMR-active nuclei in solid phase systems, magnetic shielding calculations must be capable of considering the complete lattice structure. Here we assess the accuracy of the density functional theory gauge-including projector augmented wave method, which uses pseudopotentials to approximate the nodal structure of the core electrons, to determine the magnetic properties of crystals by predicting the full chemical-shift tensors of all 13C nuclides in 14 organic single crystals from which experimental tensors have previously been reported. Plane-wave methods use periodic boundary conditions to incorporate the lattice structure, providing a substantial improvement for modeling the chemical shifts in hydrogen-bonded systems. Principal tensor components can now be predicted to an accuracy that approaches the typical experimental uncertainty. Moreover, methods that include the full solid-phase structure enable geometry optimizations to be performed on the input structures prior to calculation of the shielding. Improvement after optimization is noted here even when neutron diffraction data are used for determining the initial structures. After geometry optimization, the isotropic shift can be predicted to within 1 ppm. PMID:19831448

  6. Practical theories for service life prediction of critical aerospace structural components

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Monaghan, Richard C.; Jackson, Raymond H.

    1992-01-01

    A new second-order theory was developed for predicting the service lives of aerospace structural components. The predictions based on this new theory were compared with those based on the Ko first-order theory and the classical theory of service life predictions. The new theory gives very accurate service life predictions. An equivalent constant-amplitude stress cycle method was proposed for representing the random load spectrum for crack growth calculations. This method predicts the most conservative service life. The proposed use of minimum detectable crack size, instead of proof load established crack size as an initial crack size for crack growth calculations, could give a more realistic service life.

  7. Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation

    PubMed Central

    Garcia Lopez, Sebastian; Kim, Philip M.

    2014-01-01

    Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403

  8. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  9. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  10. Binding pose and affinity prediction in the 2016 D3R Grand Challenge 2 using the Wilma-SIE method

    NASA Astrophysics Data System (ADS)

    Hogues, Hervé; Sulea, Traian; Gaudreault, Francis; Corbeil, Christopher R.; Purisima, Enrico O.

    2018-01-01

    The Farnesoid X receptor (FXR) exhibits significant backbone movement in response to the binding of various ligands and can be a challenge for pose prediction algorithms. As part of the D3R Grand Challenge 2, we tested Wilma-SIE, a rigid-protein docking method, on a set of 36 FXR ligands for which the crystal structures had originally been blinded. These ligands covered several classes of compounds. To overcome the rigid protein limitations of the method, we used an ensemble of publicly available structures for FXR from the PDB. The use of the ensemble allowed Wilma-SIE to predict poses with average and median RMSDs of 2.3 and 1.4 Å, respectively. It was quite clear, however, that had we used a single structure for the receptor the success rate would have been much lower. The most successful predictions were obtained on chemical classes for which one or more crystal structures of the receptor bound to a molecule of the same class was available. In the absence of a crystal structure for the class, observing a consensus binding mode for the ligands of the class using one or more receptor structures of other classes seemed to be indicative of a reasonable pose prediction. Affinity prediction proved to be more challenging with generally poor correlation with experimental IC50s (Kendall tau 0.3). Even when the 36 crystal structures were used the accuracy of the predicted affinities was not appreciably improved. A possible cause of difficulty is the internal energy strain arising from conformational differences in the receptor across complexes, which may need to be properly estimated and incorporated into the SIE scoring function.

  11. United3D: a protein model quality assessment program that uses two consensus based methods.

    PubMed

    Terashi, Genki; Oosawa, Makoto; Nakamura, Yuuki; Kanou, Kazuhiko; Takeda-Shitaka, Mayuko

    2012-01-01

    In protein structure prediction, such as template-based modeling and free modeling (ab initio modeling), the step that assesses the quality of protein models is very important. We have developed a model quality assessment (QA) program United3D that uses an optimized clustering method and a simple Cα atom contact-based potential. United3D automatically estimates the quality scores (Qscore) of predicted protein models that are highly correlated with the actual quality (GDT_TS). The performance of United3D was tested in the ninth Critical Assessment of protein Structure Prediction (CASP9) experiment. In CASP9, United3D showed the lowest average loss of GDT_TS (5.3) among the QA methods participated in CASP9. This result indicates that the performance of United3D to identify the high quality models from the models predicted by CASP9 servers on 116 targets was best among the QA methods that were tested in CASP9. United3D also produced high average Pearson correlation coefficients (0.93) and acceptable Kendall rank correlation coefficients (0.68) between the Qscore and GDT_TS. This performance was competitive with the other top ranked QA methods that were tested in CASP9. These results indicate that United3D is a useful tool for selecting high quality models from many candidate model structures provided by various modeling methods. United3D will improve the accuracy of protein structure prediction.

  12. An O(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids.

    PubMed

    Chen, Ho-Lin; Condon, Anne; Jabbari, Hosna

    2009-06-01

    Efficient methods for prediction of minimum free energy (MFE) nucleic secondary structures are widely used, both to better understand structure and function of biological RNAs and to design novel nano-structures. Here, we present a new algorithm for MFE secondary structure prediction, which significantly expands the class of structures that can be handled in O(n(5)) time. Our algorithm can handle H-type pseudoknotted structures, kissing hairpins, and chains of four overlapping stems, as well as nested substructures of these types.

  13. Safe Life Propulsion Design Technologies (3rd Generation Propulsion Research and Technology)

    NASA Technical Reports Server (NTRS)

    Ellis, Rod

    2000-01-01

    The tasks outlined in this viewgraph presentation on safe life propulsion design technologies (third generation propulsion research and technology) include the following: (1) Ceramic matrix composite (CMC) life prediction methods; (2) Life prediction methods for ultra high temperature polymer matrix composites for reusable launch vehicle (RLV) airframe and engine application; (3) Enabling design and life prediction technology for cost effective large-scale utilization of MMCs and innovative metallic material concepts; (4) Probabilistic analysis methods for brittle materials and structures; (5) Damage assessment in CMC propulsion components using nondestructive characterization techniques; and (6) High temperature structural seals for RLV applications.

  14. A fragmentation and reassembly method for ab initio phasing.

    PubMed

    Shrestha, Rojan; Zhang, Kam Y J

    2015-02-01

    Ab initio phasing with de novo models has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predicts de novo models and uses them for structure determination by molecular replacement. However, even the current state-of-the-art de novo modelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracy de novo models, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predicted de novo models cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of the ab initio phasing by de novo models approach. The method can be used to solve structures when the best de novo models are still of low accuracy.

  15. SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction.

    PubMed

    Boniecki, Michal J; Lach, Grzegorz; Dawson, Wayne K; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M; Bujnicki, Janusz M

    2016-04-20

    RNA molecules play fundamental roles in cellular processes. Their function and interactions with other biomolecules are dependent on the ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. Here, we present SimRNA: a new method for computational RNA 3D structure prediction, which uses a coarse-grained representation, relies on the Monte Carlo method for sampling the conformational space, and employs a statistical potential to approximate the energy and identify conformations that correspond to biologically relevant structures. SimRNA can fold RNA molecules using only sequence information, and, on established test sequences, it recapitulates secondary structure with high accuracy, including correct prediction of pseudoknots. For modeling of complex 3D structures, it can use additional restraints, derived from experimental or computational analyses, including information about secondary structure and/or long-range contacts. SimRNA also can be used to analyze conformational landscapes and identify potential alternative structures. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. MQAPRank: improved global protein model quality assessment by learning-to-rank.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen

    2017-05-25

    Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

  17. SeqRate: sequence-based protein folding type classification and rates prediction

    PubMed Central

    2010-01-01

    Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647

  18. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

    PubMed

    Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

    2009-04-21

    In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

  19. Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.

    PubMed

    Eddy, Sean R

    2014-01-01

    Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.

  20. Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

    PubMed

    Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

    2007-02-15

    Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.

  1. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    PubMed

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  2. Modified Displacement Transfer Functions for Deformed Shape Predictions of Slender Curved Structures with Varying Curvatives

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Fleischer, Van Tran

    2014-01-01

    To eliminate the need to use finite-element modeling for structure shape predictions, a new method was invented. This method is to use the Displacement Transfer Functions to transform the measured surface strains into deflections for mapping out overall structural deformed shapes. The Displacement Transfer Functions are expressed in terms of rectilinearly distributed surface strains, and contain no material properties. This report is to apply the patented method to the shape predictions of non-symmetrically loaded slender curved structures with different curvatures up to a full circle. Because the measured surface strains are not available, finite-element analysis had to be used to analytically generate the surface strains. Previously formulated straight-beam Displacement Transfer Functions were modified by introducing the curvature-effect correction terms. Through single-point or dual-point collocations with finite-elementgenerated deflection curves, functional forms of the curvature-effect correction terms were empirically established. The resulting modified Displacement Transfer Functions can then provide quite accurate shape predictions. Also, the uniform straight-beam Displacement Transfer Function was applied to the shape predictions of a section-cut of a generic capsule (GC) outer curved sandwich wall. The resulting GC shape predictions are quite accurate in partial regions where the radius of curvature does not change sharply.

  3. Knowledge-based prediction of protein backbone conformation using a structural alphabet.

    PubMed

    Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard

    2017-01-01

    Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

  4. Sensitivity of ab Initio vs Empirical Methods in Computing Structural Effects on NMR Chemical Shifts for the Example of Peptides.

    PubMed

    Sumowski, Chris Vanessa; Hanni, Matti; Schweizer, Sabine; Ochsenfeld, Christian

    2014-01-14

    The structural sensitivity of NMR chemical shifts as computed by quantum chemical methods is compared to a variety of empirical approaches for the example of a prototypical peptide, the 38-residue kaliotoxin KTX comprising 573 atoms. Despite the simplicity of empirical chemical shift prediction programs, the agreement with experimental results is rather good, underlining their usefulness. However, we show in our present work that they are highly insensitive to structural changes, which renders their use for validating predicted structures questionable. In contrast, quantum chemical methods show the expected high sensitivity to structural and electronic changes. This appears to be independent of the quantum chemical approach or the inclusion of solvent effects. For the latter, explicit solvent simulations with increasing number of snapshots were performed for two conformers of an eight amino acid sequence. In conclusion, the empirical approaches neither provide the expected magnitude nor the patterns of NMR chemical shifts determined by the clearly more costly ab initio methods upon structural changes. This restricts the use of empirical prediction programs in studies where peptide and protein structures are utilized for the NMR chemical shift evaluation such as in NMR refinement processes, structural model verifications, or calculations of NMR nuclear spin relaxation rates.

  5. Durability predictions of adhesively bonded composite structures using accelerated characterization methods

    NASA Technical Reports Server (NTRS)

    Brinson, H. F.

    1985-01-01

    The utilization of adhesive bonding for composite structures is briefly assessed. The need for a method to determine damage initiation and propagation for such joints is outlined. Methods currently in use to analyze both adhesive joints and fiber reinforced plastics is mentioned and it is indicated that all methods require the input of the mechanical properties of the polymeric adhesive and composite matrix material. The mechanical properties of polymers are indicated to be viscoelastic and sensitive to environmental effects. A method to analytically characterize environmentally dependent linear and nonlinear viscoelastic properties is given. It is indicated that the methodology can be used to extrapolate short term data to long term design lifetimes. That is, the method can be used for long term durability predictions. Experimental results for near adhesive resins, polymers used as composite matrices and unidirectional composite laminates is given. The data is fitted well with the analytical durability methodology. Finally, suggestions are outlined for the development of an analytical methodology for the durability predictions of adhesively bonded composite structures.

  6. PIGSPro: prediction of immunoGlobulin structures v2.

    PubMed

    Lepore, Rosalba; Olimpieri, Pier P; Messih, Mario A; Tramontano, Anna

    2017-07-03

    PIGSpro is a significant upgrade of the popular PIGS server for the prediction of the structure of immunoglobulins. The software has been completely rewritten in python following a similar pipeline as in the original method, but including, at various steps, relevant modifications found to improve its prediction accuracy, as demonstrated here. The steps of the pipeline include the selection of the appropriate framework for predicting the conserved regions of the molecule by homology; the target template alignment for this portion of the molecule; the selection of the main chain conformation of the hypervariable loops according to the canonical structure model, the prediction of the third loop of the heavy chain (H3) for which complete canonical structures are not available and the packing of the light and heavy chain if derived from different templates. Each of these steps has been improved including updated methods developed along the years. Last but not least, the user interface has been completely redesigned and an automatic monthly update of the underlying database has been implemented. The method is available as a web server at http://biocomputing.it/pigspro. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    PubMed

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  8. Quantitative structure-retention relationship models for the prediction of the reversed-phase HPLC gradient retention based on the heuristic method and support vector machine.

    PubMed

    Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide

    2009-01-01

    The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.

  9. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  10. Structure prediction of polyglutamine disease proteins: comparison of methods

    PubMed Central

    2014-01-01

    Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases. PMID:25080018

  11. Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy

    PubMed Central

    Micsonai, András; Wien, Frank; Kernya, Linda; Lee, Young-Ho; Goto, Yuji; Réfrégiers, Matthieu; Kardos, József

    2015-01-01

    Circular dichroism (CD) spectroscopy is a widely used technique for the study of protein structure. Numerous algorithms have been developed for the estimation of the secondary structure composition from the CD spectra. These methods often fail to provide acceptable results on α/β-mixed or β-structure–rich proteins. The problem arises from the spectral diversity of β-structures, which has hitherto been considered as an intrinsic limitation of the technique. The predictions are less reliable for proteins of unusual β-structures such as membrane proteins, protein aggregates, and amyloid fibrils. Here, we show that the parallel/antiparallel orientation and the twisting of the β-sheets account for the observed spectral diversity. We have developed a method called β-structure selection (BeStSel) for the secondary structure estimation that takes into account the twist of β-structures. This method can reliably distinguish parallel and antiparallel β-sheets and accurately estimates the secondary structure for a broad range of proteins. Moreover, the secondary structure components applied by the method are characteristic to the protein fold, and thus the fold can be predicted to the level of topology in the CATH classification from a single CD spectrum. By constructing a web server, we offer a general tool for a quick and reliable structure analysis using conventional CD or synchrotron radiation CD (SRCD) spectroscopy for the protein science research community. The method is especially useful when X-ray or NMR techniques fail. Using BeStSel on data collected by SRCD spectroscopy, we investigated the structure of amyloid fibrils of various disease-related proteins and peptides. PMID:26038575

  12. ProTSAV: A protein tertiary structure analysis and validation server.

    PubMed

    Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B

    2016-01-01

    Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. ANOPP2 User's Manual: Version 1.2

    NASA Technical Reports Server (NTRS)

    Lopes, L. V.; Burley, C. L.

    2016-01-01

    This manual documents the Aircraft NOise Prediction Program 2 (ANOPP2). ANOPP2 is a toolkit that includes a framework, noise prediction methods, and peripheral software to aid a user in predicting and understanding aircraft noise. This manual includes an explanation of the overall design and structure of ANOPP2, including a brief introduction to aircraft noise prediction and the ANOPP2 background, philosophy, and architecture. The concept of nested acoustic data surfaces and its application to a mixed-fidelity noise prediction are presented. The structure and usage of ANOPP2, which includes the communication between the user, the ANOPP2 framework, and noise prediction methods, are presented for two scenarios: wind-tunnel and flight. These scenarios serve to provide the user with guidance and documentation references for performing a noise prediction using ANOPP2.

  14. Multiple-Instance Regression with Structured Data

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

    2008-01-01

    We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

  15. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.

    PubMed

    Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan

    2015-01-01

    Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  16. Building proteins from C alpha coordinates using the dihedral probability grid Monte Carlo method.

    PubMed Central

    Mathiowetz, A. M.; Goddard, W. A.

    1995-01-01

    Dihedral probability grid Monte Carlo (DPG-MC) is a general-purpose method of conformational sampling that can be applied to many problems in peptide and protein modeling. Here we present the DPG-MC method and apply it to predicting complete protein structures from C alpha coordinates. This is useful in such endeavors as homology modeling, protein structure prediction from lattice simulations, or fitting protein structures to X-ray crystallographic data. It also serves as an example of how DPG-MC can be applied to systems with geometric constraints. The conformational propensities for individual residues are used to guide conformational searches as the protein is built from the amino-terminus to the carboxyl-terminus. Results for a number of proteins show that both the backbone and side chain can be accurately modeled using DPG-MC. Backbone atoms are generally predicted with RMS errors of about 0.5 A (compared to X-ray crystal structure coordinates) and all atoms are predicted to an RMS error of 1.7 A or better. PMID:7549885

  17. Biological and functional relevance of CASP predictions.

    PubMed

    Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B

    2018-03-01

    Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.

  18. A Prediction Method of Binding Free Energy of Protein and Ligand

    NASA Astrophysics Data System (ADS)

    Yang, Kun; Wang, Xicheng

    2010-05-01

    Predicting the binding free energy is an important problem in bimolecular simulation. Such prediction would be great benefit in understanding protein functions, and may be useful for computational prediction of ligand binding strengths, e.g., in discovering pharmaceutical drugs. Free energy perturbation (FEP)/thermodynamics integration (TI) is a classical method to explicitly predict free energy. However, this method need plenty of time to collect datum, and that attempts to deal with some simple systems and small changes of molecular structures. Another one for estimating ligand binding affinities is linear interaction energy (LIE) method. This method employs averages of interaction potential energy terms from molecular dynamics simulations or other thermal conformational sampling techniques. Incorporation of systematic deviations from electrostatic linear response, derived from free energy perturbation studies, into the absolute binding free energy expression significantly enhances the accuracy of the approach. However, it also is time-consuming work. In this paper, a new prediction method based on steered molecular dynamics (SMD) with direction optimization is developed to compute binding free energy. Jarzynski's equality is used to derive the PMF or free-energy. The results for two numerical examples are presented, showing that the method has good accuracy and efficiency. The novel method can also simulate whole binding proceeding and give some important structural information about development of new drugs.

  19. NASA/FAA general aviation crash dynamics program

    NASA Technical Reports Server (NTRS)

    Thomson, R. G.; Hayduk, R. J.; Carden, H. D.

    1981-01-01

    The program involves controlled full scale crash testing, nonlinear structural analyses to predict large deflection elastoplastic response, and load attenuating concepts for use in improved seat and subfloor structure. Both analytical and experimental methods are used to develop expertise in these areas. Analyses include simplified procedures for estimating energy dissipating capabilities and comprehensive computerized procedures for predicting airframe response. These analyses are developed to provide designers with methods for predicting accelerations, loads, and displacements on collapsing structure. Tests on typical full scale aircraft and on full and subscale structural components are performed to verify the analyses and to demonstrate load attenuating concepts. A special apparatus was built to test emergency locator transmitters when attached to representative aircraft structure. The apparatus is shown to provide a good simulation of the longitudinal crash pulse observed in full scale aircraft crash tests.

  20. Biological and functional relevance of CASP predictions

    PubMed Central

    Liu, Tianyun; Ish‐Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D.

    2017-01-01

    Abstract Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo‐sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo‐sites), and Ten sites containing important motifs, loops, or key residues with important disease‐associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best‐ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand‐binding sites, most prediction methods have higher performance on apo‐sites than holo‐sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein‐protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein‐protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. PMID:28975675

  1. Performance of protein-structure predictions with the physics-based UNRES force field in CASP11.

    PubMed

    Krupa, Paweł; Mozolewska, Magdalena A; Wiśniewska, Marta; Yin, Yanping; He, Yi; Sieradzan, Adam K; Ganzynkowicz, Robert; Lipska, Agnieszka G; Karczyńska, Agnieszka; Ślusarz, Magdalena; Ślusarz, Rafał; Giełdoń, Artur; Czaplewski, Cezary; Jagieła, Dawid; Zaborowski, Bartłomiej; Scheraga, Harold A; Liwo, Adam

    2016-11-01

    Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulation process except for weak restraints on secondary structure to enable us to finish each prediction within the allowed 3-week time window. Because of simplified UNRES representation of polypeptide chains, use of enhanced sampling methods, code optimization and parallelization and sufficient computational resources, we were able to treat, for the first time, all 55 human prediction targets with sizes from 44 to 595 amino acid residues, the average size being 251 residues. Complete structures of six single-domain proteins were predicted accurately, with the highest accuracy being attained for the T0769, for which the CαRMSD was 3.8 Å for 97 residues of the experimental structure. Correct structures were also predicted for 13 domains of multi-domain proteins with accuracy comparable to that of the best template-based modeling methods. With further improvements of the UNRES force field that are now underway, our physics-based coarse-grained approach to protein-structure prediction will eventually reach global prediction capacity and, consequently, reliability in simulating protein structure and dynamics that are important in biochemical processes. Freely available on the web at http://www.unres.pl/ CONTACT: has5@cornell.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    PubMed

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  3. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    PubMed

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  4. Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.

    PubMed

    Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia

    2016-08-24

    The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.

  5. Prediction Model for the Carbonation of Post-Repair Materials in Carbonated RC Structures

    PubMed Central

    Lee, Hyung-Min; Lee, Han-Seung; Singh, Jitendra Kumar

    2017-01-01

    Concrete carbonation damages the passive film that surrounds reinforcement bars, resulting in their exposure to corrosion. Studies on the prediction of concrete carbonation are thus of great significance. The repair of pre-built reinforced concrete (RC) structures by methods such as remodeling was recently introduced. While many studies have been conducted on the progress of carbonation in newly constructed buildings and RC structures fitted with new repair materials, the prediction of post-repair carbonation has not been considered. In the present study, accelerated carbonation was carried out to investigate RC structures following surface layer repair, in order to determine the carbonation depth. To validate the obtained results, a second experiment was performed under the same conditions to determine the carbonation depth by the Finite Difference Method (FDM) and Finite Element Method (FEM). For the accelerated carbonation experiment, FDM and FEM analyses, produced very similar results, thus confirming that the carbonation depth in an RC structure after surface layer repair can be predicted with accuracy. The specimen repaired using inhibiting surface coating (ISC) had the highest carbonation penetration of 19.81, while this value was the lowest for the corrosion inhibiting mortar (IM) with 13.39 mm. In addition, the carbonation depth predicted by using the carbonation prediction formula after repair indicated that that the analytical and experimental values are almost identical if the initial concentration of Ca(OH)2 is assumed to be 52%. PMID:28772852

  6. Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space

    PubMed Central

    Bustos-Korts, Daniela; Malosetti, Marcos; Chapman, Scott; Biddulph, Ben; van Eeuwijk, Fred

    2016-01-01

    Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel. PMID:27672112

  7. A high-throughput approach to profile RNA structure.

    PubMed

    Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano

    2017-03-17

    Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS.

    PubMed

    Hu, Meng; Müller, Erik; Schymanski, Emma L; Ruttkies, Christoph; Schulze, Tobias; Brack, Werner; Krauss, Martin

    2018-03-01

    In nontarget screening, structure elucidation of small molecules from high resolution mass spectrometry (HRMS) data is challenging, particularly the selection of the most likely candidate structure among the many retrieved from compound databases. Several fragmentation and retention prediction methods have been developed to improve this candidate selection. In order to evaluate their performance, we compared two in silico fragmenters (MetFrag and CFM-ID) and two retention time prediction models (based on the chromatographic hydrophobicity index (CHI) and on log D). A set of 78 known organic micropollutants was analyzed by liquid chromatography coupled to a LTQ Orbitrap HRMS with electrospray ionization (ESI) in positive and negative mode using two fragmentation techniques with different collision energies. Both fragmenters (MetFrag and CFM-ID) performed well for most compounds, with average ranking the correct candidate structure within the top 25% and 22 to 37% for ESI+ and ESI- mode, respectively. The rank of the correct candidate structure slightly improved when MetFrag and CFM-ID were combined. For unknown compounds detected in both ESI+ and ESI-, generally positive mode mass spectra were better for further structure elucidation. Both retention prediction models performed reasonably well for more hydrophobic compounds but not for early eluting hydrophilic substances. The log D prediction showed a better accuracy than the CHI model. Although the two fragmentation prediction methods are more diagnostic and sensitive for candidate selection, the inclusion of retention prediction by calculating a consensus score with optimized weighting can improve the ranking of correct candidates as compared to the individual methods. Graphical abstract Consensus workflow for combining fragmentation and retention prediction in LC-HRMS-based micropollutant identification.

  9. Computational Approaches for Revealing the Structure of Membrane Transporters: Case Study on Bilitranslocase.

    PubMed

    Venko, Katja; Roy Choudhury, A; Novič, Marjana

    2017-01-01

    The structural and functional details of transmembrane proteins are vastly underexplored, mostly due to experimental difficulties regarding their solubility and stability. Currently, the majority of transmembrane protein structures are still unknown and this present a huge experimental and computational challenge. Nowadays, thanks to X-ray crystallography or NMR spectroscopy over 3000 structures of membrane proteins have been solved, among them only a few hundred unique ones. Due to the vast biological and pharmaceutical interest in the elucidation of the structure and the functional mechanisms of transmembrane proteins, several computational methods have been developed to overcome the experimental gap. If combined with experimental data the computational information enables rapid, low cost and successful predictions of the molecular structure of unsolved proteins. The reliability of the predictions depends on the availability and accuracy of experimental data associated with structural information. In this review, the following methods are proposed for in silico structure elucidation: sequence-dependent predictions of transmembrane regions, predictions of transmembrane helix-helix interactions, helix arrangements in membrane models, and testing their stability with molecular dynamics simulations. We also demonstrate the usage of the computational methods listed above by proposing a model for the molecular structure of the transmembrane protein bilitranslocase. Bilitranslocase is bilirubin membrane transporter, which shares similar tissue distribution and functional properties with some of the members of the Organic Anion Transporter family and is the only member classified in the Bilirubin Transporter Family. Regarding its unique properties, bilitranslocase is a potentially interesting drug target.

  10. Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

    PubMed Central

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092

  11. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests.

    PubMed

    Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A

    2005-05-24

    Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.

  12. Predicting β-Turns in Protein Using Kernel Logistic Regression

    PubMed Central

    Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min

    2013-01-01

    A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793

  13. Predicting β-turns in protein using kernel logistic regression.

    PubMed

    Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min

    2013-01-01

    A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.

  14. Methodology for Software Reliability Prediction. Volume 2.

    DTIC Science & Technology

    1987-11-01

    The overall acquisition ,z program shall include the resources, schedule, management, structure , and controls necessary to ensure that specified AD...Independent Verification/Validation - Programming Team Structure - Educational Level of Team Members - Experience Level of Team Members * Methods Used...Prediction or Estimation Parameter Supported: Software - Characteristics 3. Objectives: Structured programming studies and Government Ur.’.. procurement

  15. Report on the sixth blind test of organic crystal structure prediction methods

    PubMed Central

    Reilly, Anthony M.; Cooper, Richard I.; Adjiman, Claire S.; Bhattacharya, Saswata; Boese, A. Daniel; Brandenburg, Jan Gerit; Bygrave, Peter J.; Bylsma, Rita; Campbell, Josh E.; Car, Roberto; Case, David H.; Chadha, Renu; Cole, Jason C.; Cosburn, Katherine; Cuppen, Herma M.; Curtis, Farren; Day, Graeme M.; DiStasio Jr, Robert A.; Dzyabchenko, Alexander; van Eijck, Bouke P.; Elking, Dennis M.; van den Ende, Joost A.; Facelli, Julio C.; Ferraro, Marta B.; Fusti-Molnar, Laszlo; Gatsiou, Christina-Anna; Gee, Thomas S.; de Gelder, René; Ghiringhelli, Luca M.; Goto, Hitoshi; Grimme, Stefan; Guo, Rui; Hofmann, Detlef W. M.; Hoja, Johannes; Hylton, Rebecca K.; Iuzzolino, Luca; Jankiewicz, Wojciech; de Jong, Daniël T.; Kendrick, John; de Klerk, Niek J. J.; Ko, Hsin-Yu; Kuleshova, Liudmila N.; Li, Xiayue; Lohani, Sanjaya; Leusen, Frank J. J.; Lund, Albert M.; Lv, Jian; Ma, Yanming; Marom, Noa; Masunov, Artëm E.; McCabe, Patrick; McMahon, David P.; Meekes, Hugo; Metz, Michael P.; Misquitta, Alston J.; Mohamed, Sharmarke; Monserrat, Bartomeu; Needs, Richard J.; Neumann, Marcus A.; Nyman, Jonas; Obata, Shigeaki; Oberhofer, Harald; Oganov, Artem R.; Orendt, Anita M.; Pagola, Gabriel I.; Pantelides, Constantinos C.; Pickard, Chris J.; Podeszwa, Rafal; Price, Louise S.; Price, Sarah L.; Pulido, Angeles; Read, Murray G.; Reuter, Karsten; Schneider, Elia; Schober, Christoph; Shields, Gregory P.; Singh, Pawanpreet; Sugden, Isaac J.; Szalewicz, Krzysztof; Taylor, Christopher R.; Tkatchenko, Alexandre; Tuckerman, Mark E.; Vacarro, Francesca; Vasileiadis, Manolis; Vazquez-Mayagoitia, Alvaro; Vogt, Leslie; Wang, Yanchao; Watson, Rona E.; de Wijs, Gilles A.; Yang, Jack; Zhu, Qiang; Groom, Colin R.

    2016-01-01

    The sixth blind test of organic crystal structure prediction (CSP) methods has been held, with five target systems: a small nearly rigid molecule, a polymorphic former drug candidate, a chloride salt hydrate, a co-crystal and a bulky flexible molecule. This blind test has seen substantial growth in the number of participants, with the broad range of prediction methods giving a unique insight into the state of the art in the field. Significant progress has been seen in treating flexible molecules, usage of hierarchical approaches to ranking structures, the application of density-functional approximations, and the establishment of new workflows and ‘best practices’ for performing CSP calculations. All of the targets, apart from a single potentially disordered Z′ = 2 polymorph of the drug candidate, were predicted by at least one submission. Despite many remaining challenges, it is clear that CSP methods are becoming more applicable to a wider range of real systems, including salts, hydrates and larger flexible molecules. The results also highlight the potential for CSP calculations to complement and augment experimental studies of organic solid forms. PMID:27484368

  16. Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.

    PubMed

    Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio

    2017-10-12

    The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.

  17. Complete fold annotation of the human proteome using a novel structural feature space

    DOE PAGES

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less

  18. Mathematical methods for protein science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hart, W.; Istrail, S.; Atkins, J.

    1997-12-31

    Understanding the structure and function of proteins is a fundamental endeavor in molecular biology. Currently, over 100,000 protein sequences have been determined by experimental methods. The three dimensional structure of the protein determines its function, but there are currently less than 4,000 structures known to atomic resolution. Accordingly, techniques to predict protein structure from sequence have an important role in aiding the understanding of the Genome and the effects of mutations in genetic disease. The authors describe current efforts at Sandia to better understand the structure of proteins through rigorous mathematical analyses of simple lattice models. The efforts have focusedmore » on two aspects of protein science: mathematical structure prediction, and inverse protein folding.« less

  19. NASA Subsonic Rotary Wing Project - Structures and Materials Discipline

    NASA Technical Reports Server (NTRS)

    Halbig, Michael C.; Johnson, Susan M.

    2008-01-01

    The Structures & Materials Discipline within the NASA Subsonic Rotary Wing Project is focused on developing rotorcraft technologies. The technologies being developed are within the task areas of: 5.1.1 Life Prediction Methods for Engine Structures & Components 5.1.2 Erosion Resistant Coatings for Improved Turbine Blade Life 5.2.1 Crashworthiness 5.2.2 Methods for Prediction of Fatigue Damage & Self Healing 5.3.1 Propulsion High Temperature Materials 5.3.2 Lightweight Structures and Noise Integration The presentation will discuss rotorcraft specific technical challenges and needs as well as details of the work being conducted in the six task areas.

  20. Methods for evaluating the predictive accuracy of structural dynamic models

    NASA Technical Reports Server (NTRS)

    Hasselman, Timothy K.; Chrostowski, Jon D.

    1991-01-01

    Modeling uncertainty is defined in terms of the difference between predicted and measured eigenvalues and eigenvectors. Data compiled from 22 sets of analysis/test results was used to create statistical databases for large truss-type space structures and both pretest and posttest models of conventional satellite-type space structures. Modeling uncertainty is propagated through the model to produce intervals of uncertainty on frequency response functions, both amplitude and phase. This methodology was used successfully to evaluate the predictive accuracy of several structures, including the NASA CSI Evolutionary Structure tested at Langley Research Center. Test measurements for this structure were within + one-sigma intervals of predicted accuracy for the most part, demonstrating the validity of the methodology and computer code.

  1. Clathrate Structure Determination by Combining Crystal Structure Prediction with Computational and Experimental 129Xe NMR Spectroscopy

    PubMed Central

    Selent, Marcin; Nyman, Jonas; Roukala, Juho; Ilczyszyn, Marek; Oilunkaniemi, Raija; Bygrave, Peter J.; Laitinen, Risto; Jokisaari, Jukka

    2017-01-01

    Abstract An approach is presented for the structure determination of clathrates using NMR spectroscopy of enclathrated xenon to select from a set of predicted crystal structures. Crystal structure prediction methods have been used to generate an ensemble of putative structures of o‐ and m‐fluorophenol, whose previously unknown clathrate structures have been studied by 129Xe NMR spectroscopy. The high sensitivity of the 129Xe chemical shift tensor to the chemical environment and shape of the crystalline cavity makes it ideal as a probe for porous materials. The experimental powder NMR spectra can be used to directly confirm or reject hypothetical crystal structures generated by computational prediction, whose chemical shift tensors have been simulated using density functional theory. For each fluorophenol isomer one predicted crystal structure was found, whose measured and computed chemical shift tensors agree within experimental and computational error margins and these are thus proposed as the true fluorophenol xenon clathrate structures. PMID:28111848

  2. A Micromechanics-Based Method for Multiscale Fatigue Prediction

    NASA Astrophysics Data System (ADS)

    Moore, John Allan

    An estimated 80% of all structural failures are due to mechanical fatigue, often resulting in catastrophic, dangerous and costly failure events. However, an accurate model to predict fatigue remains an elusive goal. One of the major challenges is that fatigue is intrinsically a multiscale process, which is dependent on a structure's geometric design as well as its material's microscale morphology. The following work begins with a microscale study of fatigue nucleation around non- metallic inclusions. Based on this analysis, a novel multiscale method for fatigue predictions is developed. This method simulates macroscale geometries explicitly while concurrently calculating the simplified response of microscale inclusions. Thus, providing adequate detail on multiple scales for accurate fatigue life predictions. The methods herein provide insight into the multiscale nature of fatigue, while also developing a tool to aid in geometric design and material optimization for fatigue critical devices such as biomedical stents and artificial heart valves.

  3. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  4. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  5. PRISM-EM: template interface-based modelling of multi-protein complexes guided by cryo-electron microscopy density maps.

    PubMed

    Kuzu, Guray; Keskin, Ozlem; Nussinov, Ruth; Gursoy, Attila

    2016-10-01

    The structures of protein assemblies are important for elucidating cellular processes at the molecular level. Three-dimensional electron microscopy (3DEM) is a powerful method to identify the structures of assemblies, especially those that are challenging to study by crystallography. Here, a new approach, PRISM-EM, is reported to computationally generate plausible structural models using a procedure that combines crystallographic structures and density maps obtained from 3DEM. The predictions are validated against seven available structurally different crystallographic complexes. The models display mean deviations in the backbone of <5 Å. PRISM-EM was further tested on different benchmark sets; the accuracy was evaluated with respect to the structure of the complex, and the correlation with EM density maps and interface predictions were evaluated and compared with those obtained using other methods. PRISM-EM was then used to predict the structure of the ternary complex of the HIV-1 envelope glycoprotein trimer, the ligand CD4 and the neutralizing protein m36.

  6. Analysis of deep learning methods for blind protein contact prediction in CASP12.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2018-03-01

    Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.

  7. Computational modeling of RNA 3D structures, with the aid of experimental restraints

    PubMed Central

    Magnus, Marcin; Matelska, Dorota; Łach, Grzegorz; Chojnowski, Grzegorz; Boniecki, Michal J; Purta, Elzbieta; Dawson, Wayne; Dunin-Horkawicz, Stanislaw; Bujnicki, Janusz M

    2014-01-01

    In addition to mRNAs whose primary function is transmission of genetic information from DNA to proteins, numerous other classes of RNA molecules exist, which are involved in a variety of functions, such as catalyzing biochemical reactions or performing regulatory roles. In analogy to proteins, the function of RNAs depends on their structure and dynamics, which are largely determined by the ribonucleotide sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore, the majority of known RNAs remain structurally uncharacterized. To address this problem, computational structure prediction methods were developed that simulate either the physical process of RNA structure formation (“Greek science” approach) or utilize information derived from known structures of other RNA molecules (“Babylonian science” approach). All computational methods suffer from various limitations that make them generally unreliable for structure prediction of long RNA sequences. However, in many cases, the limitations of computational and experimental methods can be overcome by combining these two complementary approaches with each other. In this work, we review computational approaches for RNA structure prediction, with emphasis on implementations (particular programs) that can utilize restraints derived from experimental analyses. We also list experimental approaches, whose results can be relatively easily used by computational methods. Finally, we describe case studies where computational and experimental analyses were successfully combined to determine RNA structures that would remain out of reach for each of these approaches applied separately. PMID:24785264

  8. Computational predictions of zinc oxide hollow structures

    NASA Astrophysics Data System (ADS)

    Tuoc, Vu Ngoc; Huan, Tran Doan; Thao, Nguyen Thi

    2018-03-01

    Nanoporous materials are emerging as potential candidates for a wide range of technological applications in environment, electronic, and optoelectronics, to name just a few. Within this active research area, experimental works are predominant while theoretical/computational prediction and study of these materials face some intrinsic challenges, one of them is how to predict porous structures. We propose a computationally and technically feasible approach for predicting zinc oxide structures with hollows at the nano scale. The designed zinc oxide hollow structures are studied with computations using the density functional tight binding and conventional density functional theory methods, revealing a variety of promising mechanical and electronic properties, which can potentially find future realistic applications.

  9. Improving transmembrane protein consensus topology prediction using inter-helical interaction.

    PubMed

    Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

    2012-11-01

    Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Tertiary structure-based analysis of microRNA–target interactions

    PubMed Central

    Gan, Hin Hark; Gunsalus, Kristin C.

    2013-01-01

    Current computational analysis of microRNA interactions is based largely on primary and secondary structure analysis. Computationally efficient tertiary structure-based methods are needed to enable more realistic modeling of the molecular interactions underlying miRNA-mediated translational repression. We incorporate algorithms for predicting duplex RNA structures, ionic strength effects, duplex entropy and free energy, and docking of duplex–Argonaute protein complexes into a pipeline to model and predict miRNA–target duplex binding energies. To ensure modeling accuracy and computational efficiency, we use an all-atom description of RNA and a continuum description of ionic interactions using the Poisson–Boltzmann equation. Our method predicts the conformations of two constructs of Caenorhabditis elegans let-7 miRNA–target duplexes to an accuracy of ∼3.8 Å root mean square distance of their NMR structures. We also show that the computed duplex formation enthalpies, entropies, and free energies for eight miRNA–target duplexes agree with titration calorimetry data. Analysis of duplex–Argonaute docking shows that structural distortions arising from single-base-pair mismatches in the seed region influence the activity of the complex by destabilizing both duplex hybridization and its association with Argonaute. Collectively, these results demonstrate that tertiary structure-based modeling of miRNA interactions can reveal structural mechanisms not accessible with current secondary structure-based methods. PMID:23417009

  11. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.

    PubMed

    Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing

    2016-08-24

    Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.

  12. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    PubMed

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  13. Survey of predictors of propensity for protein production and crystallization with application to predict resolution of crystal structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gao, Jianzhao; Wu, Zhonghua; Hu, Gang

    Selection of proper targets for the X-ray crystallography will benefit biological research community immensely. Several computational models were proposed to predict propensity of successful protein production and diffraction quality crystallization from protein sequences. We reviewed a comprehensive collection of 22 such predictors that were developed in the last decade. We found that almost all of these models are easily accessible as webservers and/or standalone software and we demonstrated that some of them are widely used by the research community. We empirically evaluated and compared the predictive performance of seven representative methods. The analysis suggests that these methods produce quite accuratemore » propensities for the diffraction-quality crystallization. We also summarized results of the first study of the relation between these predictive propensities and the resolution of the crystallizable proteins. We found that the propensities predicted by several methods are significantly higher for proteins that have high resolution structures compared to those with the low resolution structures. Moreover, we tested a new meta-predictor, MetaXXC, which averages the propensities generated by the three most accurate predictors of the diffraction-quality crystallization. MetaXXC generates putative values of resolution that have modest levels of correlation with the experimental resolutions and it offers the lowest mean absolute error when compared to the seven considered methods. We conclude that protein sequences can be used to fairly accurately predict whether their corresponding protein structures can be solved using X-ray crystallography. Moreover, we also ascertain that sequences can be used to reasonably well predict the resolution of the resulting protein crystals.« less

  14. ClusPro: an automated docking and discrimination method for the prediction of protein complexes.

    PubMed

    Comeau, Stephen R; Gatchell, David W; Vajda, Sandor; Camacho, Carlos J

    2004-01-01

    Predicting protein interactions is one of the most challenging problems in functional genomics. Given two proteins known to interact, current docking methods evaluate billions of docked conformations by simple scoring functions, and in addition to near-native structures yield many false positives, i.e. structures with good surface complementarity but far from the native. We have developed a fast algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. The free energy filters select complexes with lowest desolvation and electrostatic energies. Clustering is then used to smooth the local minima and to select the ones with the broadest energy wells-a property associated with the free energy at the binding site. The robustness of the method was tested on sets of 2000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 A from the native structure. The docking and discrimination method also provides good results for a number of complexes that were used as targets in the Critical Assessment of PRedictions of Interactions experiment. The fully automated docking and discrimination server ClusPro can be found at http://structure.bu.edu

  15. Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM.

    PubMed

    Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem

    2011-08-11

    Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.

  16. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    PubMed

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  17. Crystal structure of minoxidil at low temperature and polymorph prediction.

    PubMed

    Martín-Islán, Africa P; Martín-Ramos, Daniel; Sainz-Díaz, C Ignacio

    2008-02-01

    An experimental and theoretical investigation on crystal forms of the popular and ubiquitous pharmaceutical Minoxidil is presented here. A new crystallization method is presented for Minoxidil (6-(1-piperidinyl)-2,4-pyrimidinediamide 3-oxide) in ethanol-poly(ethylene glycol), yielding crystals with good quality. The crystal structure is determined at low temperature, with a final R value of 0.035, corresponding to space group P2(1) (monoclinic) with cell dimensions a = 9.357(1) A, b = 8.231(1) A, c = 12.931(2) A, and beta = 90.353(4) degrees . Theoretical calculations of the molecular structure of Minoxidil are set forward using empirical force fields and quantum-mechanical methods. A theoretical prediction for Minoxidil crystal structure shows many possible polymorphs. The predicted crystal structures are compared with X-ray experimental data obtained in our laboratory, and the experimental crystal form is found to be one of the lowest energy polymorphs.

  18. Contact Prediction for Beta and Alpha-Beta Proteins Using Integer Linear Optimization and its Impact on the First Principles 3D Structure Prediction Method ASTRO-FOLD

    PubMed Central

    Rajgaria, R.; Wei, Y.; Floudas, C. A.

    2010-01-01

    An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα – Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contacts that assign lowest energy to the protein structure while satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β-sheet alignments. These β-sheet alignments are used as constraints for contacts between residues of β-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. PMID:20225257

  19. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape.

    PubMed

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-07-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).

  20. Frame prediction using recurrent convolutional encoder with residual learning

    NASA Astrophysics Data System (ADS)

    Yue, Boxuan; Liang, Jun

    2018-05-01

    The prediction for the frame of a video is difficult but in urgent need in auto-driving. Conventional methods can only predict some abstract trends of the region of interest. The boom of deep learning makes the prediction for frames possible. In this paper, we propose a novel recurrent convolutional encoder and DE convolutional decoder structure to predict frames. We introduce the residual learning in the convolution encoder structure to solve the gradient issues. The residual learning can transform the gradient back propagation to an identity mapping. It can reserve the whole gradient information and overcome the gradient issues in Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). Besides, compared with the branches in CNNs and the gated structures in RNNs, the residual learning can save the training time significantly. In the experiments, we use UCF101 dataset to train our networks, the predictions are compared with some state-of-the-art methods. The results show that our networks can predict frames fast and efficiently. Furthermore, our networks are used for the driving video to verify the practicability.

  1. A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

    PubMed

    Ni, Qianwu; Chen, Lei

    2017-01-01

    Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Exchange-Hole Dipole Dispersion Model for Accurate Energy Ranking in Molecular Crystal Structure Prediction II: Nonplanar Molecules.

    PubMed

    Whittleton, Sarah R; Otero-de-la-Roza, A; Johnson, Erin R

    2017-11-14

    The crystal structure prediction (CSP) of a given compound from its molecular diagram is a fundamental challenge in computational chemistry with implications in relevant technological fields. A key component of CSP is the method to calculate the lattice energy of a crystal, which allows the ranking of candidate structures. This work is the second part of our investigation to assess the potential of the exchange-hole dipole moment (XDM) dispersion model for crystal structure prediction. In this article, we study the relatively large, nonplanar, mostly flexible molecules in the first five blind tests held by the Cambridge Crystallographic Data Centre. Four of the seven experimental structures are predicted as the energy minimum, and thermal effects are demonstrated to have a large impact on the ranking of at least another compound. As in the first part of this series, delocalization error affects the results for a single crystal (compound X), in this case by detrimentally overstabilizing the π-conjugated conformation of the monomer. Overall, B86bPBE-XDM correctly predicts 16 of the 21 compounds in the five blind tests, a result similar to the one obtained using the best CSP method available to date (dispersion-corrected PW91 by Neumann et al.). Perhaps more importantly, the systems for which B86bPBE-XDM fails to predict the experimental structure as the energy minimum are mostly the same as with Neumann's method, which suggests that similar difficulties (absence of vibrational free energy corrections, delocalization error,...) are not limited to B86bPBE-XDM but affect GGA-based DFT-methods in general. Our work confirms B86bPBE-XDM as an excellent option for crystal energy ranking in CSP and offers a guide to identify crystals (organic salts, conjugated flexible systems) where difficulties may appear.

  3. Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

    PubMed

    Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

    2016-01-01

    Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.

  4. BindML/BindML+: Detecting Protein-Protein Interaction Interface Propensity from Amino Acid Substitution Patterns.

    PubMed

    Wei, Qing; La, David; Kihara, Daisuke

    2017-01-01

    Prediction of protein-protein interaction sites in a protein structure provides important information for elucidating the mechanism of protein function and can also be useful in guiding a modeling or design procedures of protein complex structures. Since prediction methods essentially assess the propensity of amino acids that are likely to be part of a protein docking interface, they can help in designing protein-protein interactions. Here, we introduce BindML and BindML+ protein-protein interaction sites prediction methods. BindML predicts protein-protein interaction sites by identifying mutation patterns found in known protein-protein complexes using phylogenetic substitution models. BindML+ is an extension of BindML for distinguishing permanent and transient types of protein-protein interaction sites. We developed an interactive web-server that provides a convenient interface to assist in structural visualization of protein-protein interactions site predictions. The input data for the web-server are a tertiary structure of interest. BindML and BindML+ are available at http://kiharalab.org/bindml/ and http://kiharalab.org/bindml/plus/ .

  5. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.

    PubMed

    Heffernan, Rhys; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-09-15

    The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10-20 amino acid residues to capture some 'short to intermediate' non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window. We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| >19) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone ϕ , ψ , θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted C α atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by ϕ and ψ angles. We expect the method to be useful for assisting protein structure and function prediction. The method is available as a SPIDER3 server and standalone package at http://sparks-lab.org . yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Predicting activation energy of thermolysis of polynitro arenes through molecular structure.

    PubMed

    Keshavarz, Mohammad Hossein; Pouretedal, Hamid Reza; Shokrolahi, Arash; Zali, Abbas; Semnani, Abolfazl

    2008-12-15

    The paper presents a new method for activation energy or the Arrhenius parameter E(a) of the thermolysis in the condensed state for different polynitro arenes as an important class of energetic molecules. The methodology assumes that E(a) of a polynitro arene with general formula C(a)H(b)N(c)O(d) can be expressed as a function of optimized elemental composition as well as the contribution of specific molecular structural parameters. The new method can predict E(a) of the thermolysis under conditions of Soviet Manometric Method (SMM), which can be related to the other convenient methods. The new correlation has the root mean square (rms) and the average deviations of 13.79 and 11.94kJ/mol, respectively, for 20 polynitro arenes with different molecular structures. The proposed new method can also be used to predict E(a) of three polynitro arenes, i.e. 2,2',2'',4,4',4'',6,6',6''-nonanitro-1,1':3',1''-terphenyl (NONA), 3,3'-diamino-2,2',4,4',6,6'-hexanitro-1,1'-biphenyl-3,3'-diamine (DIPAM) and N,N-bis(2,4-dinitrophenyl)-2,4,6-trinitroaniline (NTFA), which have complex molecular structures.

  7. A method for probing the mutational landscape of amyloid structure.

    PubMed

    O'Donnell, Charles W; Waldispühl, Jérôme; Lis, Mieszko; Halfmann, Randal; Devadas, Srinivas; Lindquist, Susan; Berger, Bonnie

    2011-07-01

    Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic 'Iowa' mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Our tool is publically available on the web at http://amyloid.csail.mit.edu/. lindquist_admin@wi.mit.edu; bab@csail.mit.edu.

  8. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  9. Predicting the equilibrium solubility of solid polycyclic aromatic hydrocarbons and dibenzothiophene using a combination of MOSCED plus molecular simulation or electronic structure calculations

    NASA Astrophysics Data System (ADS)

    Phifer, Jeremy R.; Cox, Courtney E.; da Silva, Larissa Ferreira; Nogueira, Gabriel Gonçalves; Barbosa, Ana Karolyne Pereira; Ley, Ryan T.; Bozada, Samantha M.; O'Loughlin, Elizabeth J.; Paluch, Andrew S.

    2017-06-01

    Methods to predict the equilibrium solubility of non-electrolyte solids are important for the design of novel separation processes. Here we demonstrate how conventional molecular simulation free energy calculations or electronic structure calculations in a continuum solvent, here SMD or SM8, can be used to predict parameters for the MOdified Separation of Cohesive Energy Density (MOSCED) method. The method is applied to the solutes naphthalene, anthracene, phenanthrene, pyrene and dibenzothiophene, compounds of interested to the petroleum industry and for environmental remediation. Adopting the melting point temperature and enthalpy of fusion of these compounds from experiment, we are able to predict equilibrium solubilities. Comparing to a total of 422 non-aqueous and 193 aqueous experimental solubilities, we find the proposed method is able to well correlate the data. The use of MOSCED is additionally advantageous as it is a solubility parameter-based method useful for intuitive solvent selection and formulation.

  10. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

    PubMed Central

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

    2017-01-01

    Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681

  11. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

    PubMed

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

    2017-05-01

    Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  12. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  13. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  14. Frnakenstein: multiple target inverse RNA folding.

    PubMed

    Lyngsø, Rune B; Anderson, James W J; Sizikova, Elena; Badugu, Amarendra; Hyland, Tomas; Hein, Jotun

    2012-10-09

    RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.

  15. Frnakenstein: multiple target inverse RNA folding

    PubMed Central

    2012-01-01

    Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein. PMID:23043260

  16. Ab initio NMR Confirmed Evolutionary Structure Prediction for Organic Molecular Crystals

    NASA Astrophysics Data System (ADS)

    Pham, Cong-Huy; Kucukbenli, Emine; de Gironcoli, Stefano

    2015-03-01

    Ab initio crystal structure prediction of even small organic compounds is extremely challenging due to polymorphism, molecular flexibility and difficulties in addressing the dispersion interaction from first principles. We recently implemented vdW-aware density functionals and demonstrated their success in energy ordering of aminoacid crystals. In this work we combine this development with the evolutionary structure prediction method to study cholesterol polymorphs. Cholesterol crystals have paramount importance in various diseases, from cancer to atherosclerosis. The structure of some polymorphs (e.g. ChM, ChAl, ChAh) have already been resolved while some others, which display distinct NMR spectra and are involved in disease formation, are yet to be determined. Here we thoroughly assess the applicability of evolutionary structure prediction to address such real world problems. We validate the newly predicted structures with ab initio NMR chemical shift data using secondary referencing for an improved comparison with experiments.

  17. Prototype electrostatic ground state approach to predicting crystal structures of ionic compounds: Application to hydrogen storage materials

    NASA Astrophysics Data System (ADS)

    Majzoub, E. H.; Ozoliņš, V.

    2008-03-01

    We have developed a procedure for crystal structure generation and prediction for ionic compounds consisting of a collection of cations and rigid complex anions. Our approach is based on global optimization of an energy functional consisting of the electrostatic and soft-sphere repulsive energies using Metropolis Monte Carlo (MMC) simulated annealing in conjunction with smoothing of the potential energy landscape via the distance scaling method. The resulting structures, or prototype electrostatic ground states (PEGS), are subsequently relaxed using first-principles density-functional theory (DFT) calculations to obtain accurate structural parameters and thermodynamic properties. This method is shown to produce the ground state structures of NaAlH4 and Mg(AlH4)2 , as well as the mixed cation alanate K2LiAlH6 . For LiAlH4 , the PEGS search produces a structure with a static DFT total energy equal to that of the experimentally observed structure; the latter is stabilized by vibrational contributions to the free energy. For mixed-valence hexa-alanates, XY AlH6 , where X=(Li,Na,K) , and Y=(Mg,Ca) , the PEGS method predicts six unsuspected structure types, which are not found in the existing structure databases. The PEGS search yields energies that are, on the average, better than the best database structures with the same number of atoms per unit cell, demonstrating the predictive power and usefulness of the PEGS structures. In addition to the recently synthesized LiMgAlH6 compound, we predict that LiCaAlH6 , NaCaAlH6 , and KCaAlH6 are also thermodynamically stable with respect to phase separation into other alanates and metal hydrides. In contrast, NaMgAlH6 and KMgAlH6 are slightly unstable (by less than 3kJ/mol ) relative to the phase separation into NaAlH4 , KAlH4 , and MgH2 . We suggest that solid-state ion-exchange reactions between X3AlH6 (X=Li,Na,K) and YCl2 (Y=Mg,Ca) could be used to synthesize the predicted mixed-valence hexa-alanates.

  18. Development of an Evolutionary Algorithm for the ab Initio Discovery of Two-Dimensional Materials

    NASA Astrophysics Data System (ADS)

    Revard, Benjamin Charles

    Crystal structure prediction is an important first step on the path toward computational materials design. Increasingly robust methods have become available in recent years for computing many materials properties, but because properties are largely a function of crystal structure, the structure must be known before these methods can be brought to bear. In addition, structure prediction is particularly useful for identifying low-energy structures of subperiodic materials, such as two-dimensional (2D) materials, which may adopt unexpected structures that differ from those of the corresponding bulk phases. Evolutionary algorithms, which are heuristics for global optimization inspired by biological evolution, have proven to be a fruitful approach for tackling the problem of crystal structure prediction. This thesis describes the development of an improved evolutionary algorithm for structure prediction and several applications of the algorithm to predict the structures of novel low-energy 2D materials. The first part of this thesis contains an overview of evolutionary algorithms for crystal structure prediction and presents our implementation, including details of extending the algorithm to search for clusters, wires, and 2D materials, improvements to efficiency when running in parallel, improved composition space sampling, and the ability to search for partial phase diagrams. We then present several applications of the evolutionary algorithm to 2D systems, including InP, the C-Si and Sn-S phase diagrams, and several group-IV dioxides. This thesis makes use of the Cornell graduate school's "papers" option. Chapters 1 and 3 correspond to the first-author publications of Refs. [131] and [132], respectively, and chapter 2 will soon be submitted as a first-author publication. The material in chapter 4 is taken from Ref. [144], in which I share joint first-authorship. In this case I have included only my own contributions.

  19. Application of the Collision-Imparted Velocity Method for Analyzing the Responses of Containment and Deflector Structures to Engine Rotor Fragment Impact

    NASA Technical Reports Server (NTRS)

    Collins, T. P.; Witmer, E. A.

    1973-01-01

    An approximate analysis, termed the Collision Imparted Velocity Method (CIVM), was employed for predicting the transient structural responses of containment rings or deflector rings which are subjected to impact from turbojet-engine rotor burst fragments. These 2-d structural rings may be initially circular or arbitrarily curved and may have either uniform or variable thickness; elastic, strain hardening, and strain rate material properties are accommodated. This approximate analysis utilizes kinetic energy and momentum conservation relations in order to predict the after-impact velocities of the fragment and the impacted ring segment. This information is then used in conjunction with a finite element structural response computation code to predict the transient, large deflection responses of the ring. Similarly, the equations of motion for each fragment are solved in small steps in time. Also, some comparisons of predictions with experimental data for fragment-impacted free containment rings are presented.

  20. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    NASA Astrophysics Data System (ADS)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  1. Lessons learned from participating in D3R 2016 Grand Challenge 2: compounds targeting the farnesoid X receptor

    NASA Astrophysics Data System (ADS)

    Duan, Rui; Xu, Xianjin; Zou, Xiaoqin

    2018-01-01

    D3R 2016 Grand Challenge 2 focused on predictions of binding modes and affinities for 102 compounds against the farnesoid X receptor (FXR). In this challenge, two distinct methods, a docking-based method and a template-based method, were employed by our team for the binding mode prediction. For the new template-based method, 3D ligand similarities were calculated for each query compound against the ligands in the co-crystal structures of FXR available in Protein Data Bank. The binding mode was predicted based on the co-crystal protein structure containing the ligand with the best ligand similarity score against the query compound. For the FXR dataset, the template-based method achieved a better performance than the docking-based method on the binding mode prediction. For the binding affinity prediction, an in-house knowledge-based scoring function ITScore2 and MM/PBSA approach were employed. Good performance was achieved for MM/PBSA, whereas the performance of ITScore2 was sensitive to ligand composition, e.g. the percentage of carbon atoms in the compounds. The sensitivity to ligand composition could be a clue for the further improvement of our knowledge-based scoring function.

  2. Predicting the thermal/structural performance of the atmospheric trace molecules spectroscopy /ATMOS/ Fourier transform spectrometer

    NASA Technical Reports Server (NTRS)

    Miller, J. M.

    1980-01-01

    ATMOS is a Fourier transform spectrometer to measure atmospheric trace molecules over a spectral range of 2-16 microns. Assessment of the system performance of ATMOS includes evaluations of optical system errors induced by thermal and structural effects. In order to assess the optical system errors induced from thermal and structural effects, error budgets are assembled during system engineering tasks and line of sight and wavefront deformations predictions (using operational thermal and vibration environments and computer models) are subsequently compared to the error budgets. This paper discusses the thermal/structural error budgets, modelling and analysis methods used to predict thermal/structural induced errors and the comparisons that show that predictions are within the error budgets.

  3. Bayesian model aggregation for ensemble-based estimates of protein pKa values

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.

    2014-03-01

    This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less

  4. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences.

    PubMed

    Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen

    2008-10-01

    In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.

  5. Comparison of the performance of different DFT methods in the calculations of the molecular structure and vibration spectra of serotonin (5-hydroxytryptamine, 5-HT)

    NASA Astrophysics Data System (ADS)

    Yang, Yue; Gao, Hongwei

    2012-04-01

    Serotonin (5-hydroxytryptamine, 5-HT) is a monoamine neurotransmitter which plays an important role in treating acute or clinical stress. The comparative performance of different density functional theory (DFT) methods at various basis sets in predicting the molecular structure and vibration spectra of serotonin was reported. The calculation results of different methods including mPW1PW91, HCTH, SVWN, PBEPBE, B3PW91 and B3LYP with various basis sets including LANL2DZ, SDD, LANL2MB, 6-31G, 6-311++G and 6-311+G* were compared with the experimental data. It is remarkable that the SVWN/6-311++G and SVWN/6-311+G* levels afford the best quality to predict the structure of serotonin. The results also indicate that PBEPBE/LANL2DZ level show better performance in the vibration spectra prediction of serotonin than other DFT methods.

  6. Guiding Conformation Space Search with an All-Atom Energy Potential

    PubMed Central

    Brunette, TJ; Brock, Oliver

    2009-01-01

    The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration towards regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy. PMID:18536015

  7. Prediction of binding hot spot residues by using structural and evolutionary parameters.

    PubMed

    Higa, Roberto Hiroshi; Tozzi, Clésio Luis

    2009-07-01

    In this work, we present a method for predicting hot spot residues by using a set of structural and evolutionary parameters. Unlike previous studies, we use a set of parameters which do not depend on the structure of the protein in complex, so that the predictor can also be used when the interface region is unknown. Despite the fact that no information concerning proteins in complex is used for prediction, the application of the method to a compiled dataset described in the literature achieved a performance of 60.4%, as measured by F-Measure, corresponding to a recall of 78.1% and a precision of 49.5%. This result is higher than those reported by previous studies using the same data set.

  8. Modeling and dynamic environment analysis technology for spacecraft

    NASA Astrophysics Data System (ADS)

    Fang, Ren; Zhaohong, Qin; Zhong, Zhang; Zhenhao, Liu; Kai, Yuan; Long, Wei

    Spacecraft sustains complex and severe vibrations and acoustic environments during flight. Predicting the resulting structures, including numerical predictions of fluctuating pressure, updating models and random vibration and acoustic analysis, plays an important role during the design, manufacture and ground testing of spacecraft. In this paper, Monotony Integrative Large Eddy Simulation (MILES) is introduced to predict the fluctuating pressure of the fairing. The exact flow structures of the fairing wall surface under different Mach numbers are obtained, then a spacecraft model is constructed using the finite element method (FEM). According to the modal test data, the model is updated by the penalty method. On this basis, the random vibration and acoustic responses of the fairing and satellite are analyzed by different methods. The simulated results agree well with the experimental ones, which shows the validity of the modeling and dynamic environment analysis technology. This information can better support test planning, defining test conditions and designing optimal structures.

  9. Structure, stability, and properties of the trans peroxo nitrate radical: the importance of nondynamic correlation.

    PubMed

    Dutta, Achintya Kumar; Dar, Manzoor; Vaval, Nayana; Pal, Sourav

    2014-02-27

    We report a comparative single-reference and multireference coupled-cluster investigation on the structure, potential energy surface, and IR spectroscopic properties of the trans peroxo nitrate radical, one of the key intermediates in stratospheric NOX chemistry. The previous single-reference ab initio studies predicted an unbound structure for the trans peroxo nitrate radical. However, our Fock space multireference coupled-cluster calculation confirms a bound structure for the trans peroxo nitrate radical, in accordance with the experimental results reported earlier. Further, the analysis of the potential energy surface in FSMRCC method indicates a well-behaved minima, contrary to the shallow minima predicted by the single-reference coupled-cluster method. The harmonic force field analysis, of various possible isomers of peroxo nitrate also reveals that only the trans structure leads to the experimentally observed IR peak at 1840 cm(-1). The present study highlights the critical importance of nondynamic correlation in predicting the structure and properties of high-energy stratospheric NOx radicals.

  10. Augmented Method to Improve Thermal Data for the Figure Drift Thermal Distortion Predictions of the JWST OTIS Cryogenic Vacuum Test

    NASA Technical Reports Server (NTRS)

    Park, Sang C.; Carnahan, Timothy M.; Cohen, Lester M.; Congedo, Cherie B.; Eisenhower, Michael J.; Ousley, Wes; Weaver, Andrew; Yang, Kan

    2017-01-01

    The JWST Optical Telescope Element (OTE) assembly is the largest optically stable infrared-optimized telescope currently being manufactured and assembled, and is scheduled for launch in 2018. The JWST OTE, including the 18 segment primary mirror, secondary mirror, and the Aft Optics Subsystem (AOS) are designed to be passively cooled and operate near 45K. These optical elements are supported by a complex composite backplane structure. As a part of the structural distortion model validation efforts, a series of tests are planned during the cryogenic vacuum test of the fully integrated flight hardware at NASA JSC Chamber A. The successful ends to the thermal-distortion phases are heavily dependent on the accurate temperature knowledge of the OTE structural members. However, the current temperature sensor allocations during the cryo-vac test may not have sufficient fidelity to provide accurate knowledge of the temperature distributions within the composite structure. A method based on an inverse distance relationship among the sensors and thermal model nodes was developed to improve the thermal data provided for the nanometer scale WaveFront Error (WFE) predictions. The Linear Distance Weighted Interpolation (LDWI) method was developed to augment the thermal model predictions based on the sparse sensor information. This paper will encompass the development of the LDWI method using the test data from the earlier pathfinder cryo-vac tests, and the results of the notional and as tested WFE predictions from the structural finite element model cases to characterize the accuracies of this LDWI method.

  11. Role of conformational sampling in computing mutation-induced changes in protein structure and stability.

    PubMed

    Kellogg, Elizabeth H; Leaver-Fay, Andrew; Baker, David

    2011-03-01

    The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromolecular modeling methodology and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amounts of conformational sampling, provided that the resolution of the force field is matched to the resolution of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Analysis of the outliers in the stability change calculations suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling. Copyright © 2010 Wiley-Liss, Inc.

  12. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  13. STRUM: structure-based prediction of protein stability changes upon single-point mutation.

    PubMed

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-10-01

    Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. STRUM: structure-based prediction of protein stability changes upon single-point mutation

    PubMed Central

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-01-01

    Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206

  15. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework.

    PubMed

    Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I

    2018-04-14

    Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction.

    PubMed

    Yang, Yuedong; Li, Xiaomei; Zhao, Huiying; Zhan, Jian; Wang, Jihua; Zhou, Yaoqi

    2017-01-01

    As most RNA structures are elusive to structure determination, obtaining solvent accessible surface areas (ASAs) of nucleotides in an RNA structure is an important first step to characterize potential functional sites and core structural regions. Here, we developed RNAsnap, the first machine-learning method trained on protein-bound RNA structures for solvent accessibility prediction. Built on sequence profiles from multiple sequence alignment (RNAsnap-prof), the method provided robust prediction in fivefold cross-validation and an independent test (Pearson correlation coefficients, r, between predicted and actual ASA values are 0.66 and 0.63, respectively). Application of the method to 6178 mRNAs revealed its positive correlation to mRNA accessibility by dimethyl sulphate (DMS) experimentally measured in vivo (r = 0.37) but not in vitro (r = 0.07), despite the lack of training on mRNAs and the fact that DMS accessibility is only an approximation to solvent accessibility. We further found strong association across coding and noncoding regions between predicted solvent accessibility of the mutation site of a single nucleotide variant (SNV) and the frequency of that variant in the population for 2.2 million SNVs obtained in the 1000 Genomes Project. Moreover, mapping solvent accessibility of RNAs to the human genome indicated that introns, 5' cap of 5' and 3' cap of 3' untranslated regions, are more solvent accessible, consistent with their respective functional roles. These results support conformational selections as the mechanism for the formation of RNA-protein complexes and highlight the utility of genome-scale characterization of RNA tertiary structures by RNAsnap. The server and its stand-alone downloadable version are available at http://sparks-lab.org. © 2016 Yang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  17. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  18. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  19. NASTRAN application for the prediction of aircraft interior noise

    NASA Technical Reports Server (NTRS)

    Marulo, Francesco; Beyer, Todd B.

    1987-01-01

    The application of a structural-acoustic analogy within the NASTRAN finite element program for the prediction of aircraft interior noise is presented. Some refinements of the method, which reduce the amount of computation required for large, complex structures, are discussed. Also, further improvements are proposed and preliminary comparisons with structural and acoustic modal data obtained for a large, composite cylinder are presented.

  20. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes

    PubMed Central

    Jespersen, Martin Closter; Peters, Bjoern

    2017-01-01

    Abstract Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community. PMID:28472356

  1. A systematic review on popularity, application and characteristics of protein secondary structure prediction tools.

    PubMed

    Kashani-Amin, Elaheh; Tabatabaei-Malazy, Ozra; Sakhteman, Amirhossein; Larijani, Bagher; Ebrahim-Habibi, Azadeh

    2018-02-27

    Prediction of proteins' secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple secondary structure prediction (SSP) options is challenging. The current study is an insight onto currently favored methods and tools, within various contexts. A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of 209 studies were finally found eligible to extract data. Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating a SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. This study provides a comprehensive insight about the recent usage of SSP tools which could be helpful for selecting a proper tool's choice. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Multi-views Fusion CNN for Left Ventricular Volumes Estimation on Cardiac MR Images.

    PubMed

    Luo, Gongning; Dong, Suyu; Wang, Kuanquan; Zuo, Wangmeng; Cao, Shaodong; Zhang, Henggui

    2017-10-13

    Left ventricular (LV) volumes estimation is a critical procedure for cardiac disease diagnosis. The objective of this paper is to address direct LV volumes prediction task. In this paper, we propose a direct volumes prediction method based on the end-to-end deep convolutional neural networks (CNN). We study the end-to-end LV volumes prediction method in items of the data preprocessing, networks structure, and multi-views fusion strategy. The main contributions of this paper are the following aspects. First, we propose a new data preprocessing method on cardiac magnetic resonance (CMR). Second, we propose a new networks structure for end-to-end LV volumes estimation. Third, we explore the representational capacity of different slices, and propose a fusion strategy to improve the prediction accuracy. The evaluation results show that the proposed method outperforms other state-of-the-art LV volumes estimation methods on the open accessible benchmark datasets. The clinical indexes derived from the predicted volumes agree well with the ground truth (EDV: R=0.974, RMSE=9.6ml; ESV: R=0.976, RMSE=7.1ml; EF: R=0.828, RMSE =4.71%). Experimental results prove that the proposed method has high accuracy and efficiency on LV volumes prediction task. The proposed method not only has application potential for cardiac diseases screening for large-scale CMR data, but also can be extended to other medical image research fields.

  3. Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

    NASA Astrophysics Data System (ADS)

    Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

    Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.

  4. Prediction of enzymatic pathways by integrative pathway mapping

    PubMed Central

    Wichelecki, Daniel J; San Francisco, Brian; Zhao, Suwen; Rodionov, Dmitry A; Vetting, Matthew W; Al-Obaidi, Nawar F; Lin, Henry; O'Meara, Matthew J; Scott, David A; Morris, John H; Russel, Daniel; Almo, Steven C; Osterman, Andrei L

    2018-01-01

    The functions of most proteins are yet to be determined. The function of an enzyme is often defined by its interacting partners, including its substrate and product, and its role in larger metabolic networks. Here, we describe a computational method that predicts the functions of orphan enzymes by organizing them into a linear metabolic pathway. Given candidate enzyme and metabolite pathway members, this aim is achieved by finding those pathways that satisfy structural and network restraints implied by varied input information, including that from virtual screening, chemoinformatics, genomic context analysis, and ligand -binding experiments. We demonstrate this integrative pathway mapping method by predicting the L-gulonate catabolic pathway in Haemophilus influenzae Rd KW20. The prediction was subsequently validated experimentally by enzymology, crystallography, and metabolomics. Integrative pathway mapping by satisfaction of structural and network restraints is extensible to molecular networks in general and thus formally bridges the gap between structural biology and systems biology. PMID:29377793

  5. Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.

    PubMed

    Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka

    Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.

  6. Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015

    NASA Astrophysics Data System (ADS)

    Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin

    2017-08-01

    The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.

  7. ProbFold: a probabilistic method for integration of probing data in RNA secondary structure prediction.

    PubMed

    Sahoo, Sudhakar; Świtnicki, Michał P; Pedersen, Jakob Skou

    2016-09-01

    Recently, new RNA secondary structure probing techniques have been developed, including Next Generation Sequencing based methods capable of probing transcriptome-wide. These techniques hold great promise for improving structure prediction accuracy. However, each new data type comes with its own signal properties and biases, which may even be experiment specific. There is therefore a growing need for RNA structure prediction methods that can be automatically trained on new data types and readily extended to integrate and fully exploit multiple types of data. Here, we develop and explore a modular probabilistic approach for integrating probing data in RNA structure prediction. It can be automatically trained given a set of known structures with probing data. The approach is demonstrated on SHAPE datasets, where we evaluate and selectively model specific correlations. The approach often makes superior use of the probing data signal compared to other methods. We illustrate the use of ProbFold on multiple data types using both simulations and a small set of structures with both SHAPE, DMS and CMCT data. Technically, the approach combines stochastic context-free grammars (SCFGs) with probabilistic graphical models. This approach allows rapid adaptation and integration of new probing data types. ProbFold is implemented in C ++. Models are specified using simple textual formats. Data reformatting is done using separate C ++ programs. Source code, statically compiled binaries for x86 Linux machines, C ++ programs, example datasets and a tutorial is available from http://moma.ki.au.dk/prj/probfold/ : jakob.skou@clin.au.dk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. A Review of Computational Intelligence Methods for Eukaryotic Promoter Prediction.

    PubMed

    Singh, Shailendra; Kaur, Sukhbir; Goel, Neelam

    2015-01-01

    In past decades, prediction of genes in DNA sequences has attracted the attention of many researchers but due to its complex structure it is extremely intricate to correctly locate its position. A large number of regulatory regions are present in DNA that helps in transcription of a gene. Promoter is one such region and to find its location is a challenging problem. Various computational methods for promoter prediction have been developed over the past few years. This paper reviews these promoter prediction methods. Several difficulties and pitfalls encountered by these methods are also detailed, along with future research directions.

  9. Prediction of distribution coefficient from structure. 1. Estimation method.

    PubMed

    Csizmadia, F; Tsantili-Kakoulidou, A; Panderi, I; Darvas, F

    1997-07-01

    A method has been developed for the estimation of the distribution coefficient (D), which considers the microspecies of a compound. D is calculated from the microscopic dissociation constants (microconstants), the partition coefficients of the microspecies, and the counterion concentration. A general equation for the calculation of D at a given pH is presented. The microconstants are calculated from the structure using Hammett and Taft equations. The partition coefficients of the ionic microspecies are predicted by empirical equations using the dissociation constants and the partition coefficient of the uncharged species, which are estimated from the structure by a Linear Free Energy Relationship method. The algorithm is implemented in a program module called PrologD.

  10. Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms

    PubMed Central

    2012-01-01

    Metabolism of xenobiotics remains a central challenge for the discovery and development of drugs, cosmetics, nutritional supplements, and agrochemicals. Metabolic transformations are frequently related to the incidence of toxic effects that may result from the emergence of reactive species, the systemic accumulation of metabolites, or by induction of metabolic pathways. Experimental investigation of the metabolism of small organic molecules is particularly resource demanding; hence, computational methods are of considerable interest to complement experimental approaches. This review provides a broad overview of structure- and ligand-based computational methods for the prediction of xenobiotic metabolism. Current computational approaches to address xenobiotic metabolism are discussed from three major perspectives: (i) prediction of sites of metabolism (SOMs), (ii) elucidation of potential metabolites and their chemical structures, and (iii) prediction of direct and indirect effects of xenobiotics on metabolizing enzymes, where the focus is on the cytochrome P450 (CYP) superfamily of enzymes, the cardinal xenobiotics metabolizing enzymes. For each of these domains, a variety of approaches and their applications are systematically reviewed, including expert systems, data mining approaches, quantitative structure–activity relationships (QSARs), and machine learning-based methods, pharmacophore-based algorithms, shape-focused techniques, molecular interaction fields (MIFs), reactivity-focused techniques, protein–ligand docking, molecular dynamics (MD) simulations, and combinations of methods. Predictive metabolism is a developing area, and there is still enormous potential for improvement. However, it is clear that the combination of rapidly increasing amounts of available ligand- and structure-related experimental data (in particular, quantitative data) with novel and diverse simulation and modeling approaches is accelerating the development of effective tools for prediction of in vivo metabolism, which is reflected by the diverse and comprehensive data sources and methods for metabolism prediction reviewed here. This review attempts to survey the range and scope of computational methods applied to metabolism prediction and also to compare and contrast their applicability and performance. PMID:22339582

  11. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  12. Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

    PubMed

    Bratholm, Lars A; Jensen, Jan H

    2017-03-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.

  13. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    PubMed Central

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026

  14. Predictive design procedures, VESYS users manual : an interim design method for flexible pavements using the VESYS structural subsystem

    DOT National Transportation Integrated Search

    1978-01-01

    This manual has been written to provide the pavement manager and design engineer with a ready reference of procedures to predict the structural responses and hence the integrity of flexible pavements. A pavement section of known geometry is chosen, a...

  15. Linear regression models for solvent accessibility prediction in proteins.

    PubMed

    Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2005-04-01

    The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.

  16. An improved method to detect correct protein folds using partial clustering.

    PubMed

    Zhou, Jianjun; Wishart, David S

    2013-01-16

    Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.

  17. An improved method to detect correct protein folds using partial clustering

    PubMed Central

    2013-01-01

    Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835

  18. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool) A Program to Estimate Toxicity from Molecular Structure

    EPA Science Inventory

    The user's guide describes the methods used by TEST to predict toxicity and physical properties (including the new mode of action based method used to predict acute aquatic toxicity). It describes all of the experimental data sets included in the tool. It gives the prediction res...

  19. Interlinking backscatter, grain size and benthic community structure

    NASA Astrophysics Data System (ADS)

    McGonigle, Chris; Collier, Jenny S.

    2014-06-01

    The relationship between acoustic backscatter, sediment grain size and benthic community structure is examined using three different quantitative methods, covering image- and angular response-based approaches. Multibeam time-series backscatter (300 kHz) data acquired in 2008 off the coast of East Anglia (UK) are compared with grain size properties, macrofaunal abundance and biomass from 130 Hamon and 16 Clamshell grab samples. Three predictive methods are used: 1) image-based (mean backscatter intensity); 2) angular response-based (predicted mean grain size), and 3) image-based (1st principal component and classification) from Quester Tangent Corporation Multiview software. Relationships between grain size and backscatter are explored using linear regression. Differences in grain size and benthic community structure between acoustically defined groups are examined using ANOVA and PERMANOVA+. Results for the Hamon grab stations indicate significant correlations between measured mean grain size and mean backscatter intensity, angular response predicted mean grain size, and 1st principal component of QTC analysis (all p < 0.001). Results for the Clamshell grab for two of the methods have stronger positive correlations; mean backscatter intensity (r2 = 0.619; p < 0.001) and angular response predicted mean grain size (r2 = 0.692; p < 0.001). ANOVA reveals significant differences in mean grain size (Hamon) within acoustic groups for all methods: mean backscatter (p < 0.001), angular response predicted grain size (p < 0.001), and QTC class (p = 0.009). Mean grain size (Clamshell) shows a significant difference between groups for mean backscatter (p = 0.001); other methods were not significant. PERMANOVA for the Hamon abundance shows benthic community structure was significantly different between acoustic groups for all methods (p ≤ 0.001). Overall these results show considerable promise in that more than 60% of the variance in the mean grain size of the Clamshell grab samples can be explained by mean backscatter or acoustically-predicted grain size. These results show that there is significant predictive capacity for sediment characteristics from multibeam backscatter and that these acoustic classifications can have ecological validity.

  20. Exploring the boundary between aromatic and olefinic character: Bad news for second-order perturbation theory and density functional schemes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sulzbach, H.M.; Schaefer, H.F. III; Klopper, W.

    1996-04-10

    The question whether [10]annulene prefers olefinic structures with alternate single and double bonds or aromatic structures like all other small to medium sized uncharged (4n + 2){pi} electron homologs (e.g. benzene, [14]annulene) has been controversial for more than 20 years. Our new results suggest that only the high-order correlated methods will be able to correctly predict the [10]annulene potential energy surface. The UNO-CAS results and the strong oscillation of the MP series show that nondynamical electron correlation is important. Consequently, reliable results can only be expected at the highest correlated levels like CCSD(T) method, which predicts the olefinic twist structuremore » to be lower in energy by 3-7 kcal/mol. This prediction that the twist structure is lower in energy is supported by (a) the MP2-R12 method, which shows that large basis sets favor the olefinic structure relative to the aromatic, and (b) the fact that both structures are about equally affected by nondynamical electron correlation. We conclude that [10]annulene is a system which cannot be described adequately by either second-order Moller-Plesset perturbation theory or density functional methods. 13 refs., 3 tabs.« less

  1. Fast and reliable prediction of domain-peptide binding affinity using coarse-grained structure models.

    PubMed

    Tian, Feifei; Tan, Rui; Guo, Tailin; Zhou, Peng; Yang, Li

    2013-07-01

    Domain-peptide recognition and interaction are fundamentally important for eukaryotic signaling and regulatory networks. It is thus essential to quantitatively infer the binding stability and specificity of such interaction based upon large-scale but low-accurate complex structure models which could be readily obtained from sophisticated molecular modeling procedure. In the present study, a new method is described for the fast and reliable prediction of domain-peptide binding affinity with coarse-grained structure models. This method is designed to tolerate strong random noises involved in domain-peptide complex structures and uses statistical modeling approach to eliminate systematic bias associated with a group of investigated samples. As a paradigm, this method was employed to model and predict the binding behavior of various peptides to four evolutionarily unrelated peptide-recognition domains (PRDs), i.e. human amph SH3, human nherf PDZ, yeast syh GYF and yeast bmh 14-3-3, and moreover, we explored the molecular mechanism and biological implication underlying the binding of cognate and noncognate peptide ligands to their domain receptors. It is expected that the newly proposed method could be further used to perform genome-wide inference of domain-peptide binding at three-dimensional structure level. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  2. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  3. Characterizing the Response of Composite Panels to a Pyroshock Induced Environment Using Design of Experiments Methodology

    NASA Technical Reports Server (NTRS)

    Parsons, David S.; Ordway, David; Johnson, Kenneth

    2013-01-01

    This experimental study seeks to quantify the impact various composite parameters have on the structural response of a composite structure in a pyroshock environment. The prediction of an aerospace structure's response to pyroshock induced loading is largely dependent on empirical databases created from collections of development and flight test data. While there is significant structural response data due to pyroshock induced loading for metallic structures, there is much less data available for composite structures. One challenge of developing a composite pyroshock response database as well as empirical prediction methods for composite structures is the large number of parameters associated with composite materials. This experimental study uses data from a test series planned using design of experiments (DOE) methods. Statistical analysis methods are then used to identify which composite material parameters most greatly influence a flat composite panel's structural response to pyroshock induced loading. The parameters considered are panel thickness, type of ply, ply orientation, and pyroshock level induced into the panel. The results of this test will aid in future large scale testing by eliminating insignificant parameters as well as aid in the development of empirical scaling methods for composite structures' response to pyroshock induced loading.

  4. Characterizing the Response of Composite Panels to a Pyroshock Induced Environment using Design of Experiments Methodology

    NASA Technical Reports Server (NTRS)

    Parsons, David S.; Ordway, David O.; Johnson, Kenneth L.

    2013-01-01

    This experimental study seeks to quantify the impact various composite parameters have on the structural response of a composite structure in a pyroshock environment. The prediction of an aerospace structure's response to pyroshock induced loading is largely dependent on empirical databases created from collections of development and flight test data. While there is significant structural response data due to pyroshock induced loading for metallic structures, there is much less data available for composite structures. One challenge of developing a composite pyroshock response database as well as empirical prediction methods for composite structures is the large number of parameters associated with composite materials. This experimental study uses data from a test series planned using design of experiments (DOE) methods. Statistical analysis methods are then used to identify which composite material parameters most greatly influence a flat composite panel's structural response to pyroshock induced loading. The parameters considered are panel thickness, type of ply, ply orientation, and pyroshock level induced into the panel. The results of this test will aid in future large scale testing by eliminating insignificant parameters as well as aid in the development of empirical scaling methods for composite structures' response to pyroshock induced loading.

  5. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  6. Evaluation of Deep Learning Representations of Spatial Storm Data

    NASA Astrophysics Data System (ADS)

    Gagne, D. J., II; Haupt, S. E.; Nychka, D. W.

    2017-12-01

    The spatial structure of a severe thunderstorm and its surrounding environment provide useful information about the potential for severe weather hazards, including tornadoes, hail, and high winds. Statistics computed over the area of a storm or from the pre-storm environment can provide descriptive information but fail to capture structural information. Because the storm environment is a complex, high-dimensional space, identifying methods to encode important spatial storm information in a low-dimensional form should aid analysis and prediction of storms by statistical and machine learning models. Principal component analysis (PCA), a more traditional approach, transforms high-dimensional data into a set of linearly uncorrelated, orthogonal components ordered by the amount of variance explained by each component. The burgeoning field of deep learning offers two potential approaches to this problem. Convolutional Neural Networks are a supervised learning method for transforming spatial data into a hierarchical set of feature maps that correspond with relevant combinations of spatial structures in the data. Generative Adversarial Networks (GANs) are an unsupervised deep learning model that uses two neural networks trained against each other to produce encoded representations of spatial data. These different spatial encoding methods were evaluated on the prediction of severe hail for a large set of storm patches extracted from the NCAR convection-allowing ensemble. Each storm patch contains information about storm structure and the near-storm environment. Logistic regression and random forest models were trained using the PCA and GAN encodings of the storm data and were compared against the predictions from a convolutional neural network. All methods showed skill over climatology at predicting the probability of severe hail. However, the verification scores among the methods were very similar and the predictions were highly correlated. Further evaluations are being performed to determine how the choice of input variables affects the results.

  7. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  8. Extended Aging Theories for Predictions of Safe Operational Life of Critical Airborne Structural Components

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Chen, Tony

    2006-01-01

    The previously developed Ko closed-form aging theory has been reformulated into a more compact mathematical form for easier application. A new equivalent loading theory and empirical loading theories have also been developed and incorporated into the revised Ko aging theory for the prediction of a safe operational life of airborne failure-critical structural components. The new set of aging and loading theories were applied to predict the safe number of flights for the B-52B aircraft to carry a launch vehicle, the structural life of critical components consumed by load excursion to proof load value, and the ground-sitting life of B-52B pylon failure-critical structural components. A special life prediction method was developed for the preflight predictions of operational life of failure-critical structural components of the B-52H pylon system, for which no flight data are available.

  9. Geostatistics for spatial genetic structures: study of wild populations of perennial ryegrass.

    PubMed

    Monestiez, P; Goulard, M; Charmet, G

    1994-04-01

    Methods based on geostatistics were applied to quantitative traits of agricultural interest measured on a collection of 547 wild populations of perennial ryegrass in France. The mathematical background of these methods, which resembles spatial autocorrelation analysis, is briefly described. When a single variable is studied, the spatial structure analysis is similar to spatial autocorrelation analysis, and a spatial prediction method, called "kriging", gives a filtered map of the spatial pattern over all the sampled area. When complex interactions of agronomic traits with different evaluation sites define a multivariate structure for the spatial analysis, geostatistical methods allow the spatial variations to be broken down into two main spatial structures with ranges of 120 km and 300 km, respectively. The predicted maps that corresponded to each range were interpreted as a result of the isolation-by-distance model and as a consequence of selection by environmental factors. Practical collecting methodology for breeders may be derived from such spatial structures.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less

  11. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

    PubMed

    Budowski-Tal, Inbal; Nov, Yuval; Kolodny, Rachel

    2010-02-23

    Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

  12. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223

  13. Predicting the accuracy of ligand overlay methods with Random Forest models.

    PubMed

    Nandigam, Ravi K; Evans, David A; Erickson, Jon A; Kim, Sangtae; Sutherland, Jeffrey J

    2008-12-01

    The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.

  14. Co-evolutionary Analysis of Domains in Interacting Proteins Reveals Insights into Domain–Domain Interactions Mediating Protein–Protein Interactions

    PubMed Central

    Jothi, Raja; Cherukuri, Praveen F.; Tasneem, Asba; Przytycka, Teresa M.

    2006-01-01

    Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites. PMID:16949097

  15. QSAR models for predicting octanol/water and organic carbon/water partition coefficients of polychlorinated biphenyls.

    PubMed

    Yu, S; Gao, S; Gan, Y; Zhang, Y; Ruan, X; Wang, Y; Yang, L; Shi, J

    2016-04-01

    Quantitative structure-property relationship modelling can be a valuable alternative method to replace or reduce experimental testing. In particular, some endpoints such as octanol-water (KOW) and organic carbon-water (KOC) partition coefficients of polychlorinated biphenyls (PCBs) are easier to predict and various models have been already developed. In this paper, two different methods, which are multiple linear regression based on the descriptors generated using Dragon software and hologram quantitative structure-activity relationships, were employed to predict suspended particulate matter (SPM) derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of 209 PCBs. The predictive ability of the derived models was validated using a test set. The performances of all these models were compared with EPI Suite™ software. The results indicated that the proposed models were robust and satisfactory, and could provide feasible and promising tools for the rapid assessment of the SPM derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of PCBs.

  16. Fast large-scale clustering of protein structures using Gauss integrals.

    PubMed

    Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas

    2012-02-15

    Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

  17. A computer program for cyclic plasticity and structural fatigue analysis

    NASA Technical Reports Server (NTRS)

    Kalev, I.

    1980-01-01

    A computerized tool for the analysis of time independent cyclic plasticity structural response, life to crack initiation prediction, and crack growth rate prediction for metallic materials is described. Three analytical items are combined: the finite element method with its associated numerical techniques for idealization of the structural component, cyclic plasticity models for idealization of the material behavior, and damage accumulation criteria for the fatigue failure.

  18. Modeling the assembly order of multimeric heteroprotein complexes

    PubMed Central

    Esquivel-Rodriguez, Juan; Terashi, Genki; Christoffer, Charles; Shin, Woong-Hee

    2018-01-01

    Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes. PMID:29329283

  19. Modeling the assembly order of multimeric heteroprotein complexes.

    PubMed

    Peterson, Lenna X; Togawa, Yoichiro; Esquivel-Rodriguez, Juan; Terashi, Genki; Christoffer, Charles; Roy, Amitava; Shin, Woong-Hee; Kihara, Daisuke

    2018-01-01

    Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes.

  20. Structural reliability analysis under evidence theory using the active learning kriging model

    NASA Astrophysics Data System (ADS)

    Yang, Xufeng; Liu, Yongshou; Ma, Panke

    2017-11-01

    Structural reliability analysis under evidence theory is investigated. It is rigorously proved that a surrogate model providing only correct sign prediction of the performance function can meet the accuracy requirement of evidence-theory-based reliability analysis. Accordingly, a method based on the active learning kriging model which only correctly predicts the sign of the performance function is proposed. Interval Monte Carlo simulation and a modified optimization method based on Karush-Kuhn-Tucker conditions are introduced to make the method more efficient in estimating the bounds of failure probability based on the kriging model. Four examples are investigated to demonstrate the efficiency and accuracy of the proposed method.

  1. Crystal Structure Prediction and its Application in Earth and Materials Sciences

    NASA Astrophysics Data System (ADS)

    Zhu, Qiang

    First of all, we describe how to predict crystal structure by evolutionary approach, and extend this method to study the packing of organic molecules, by our specially designed constrained evolutionary algorithm. The main feature of this new approach is that each unit or molecule is treated as a whole body, which drastically reduces the search space and improves the efficiency. The improved method is possibly to be applied in the fields of (1) high pressure phase of simple molecules (H2O, NH3, CH4, etc); (2) pharmaceutical molecules (glycine, aspirin, etc); (3) complex inorganic crystals containing cluster or molecular unit, (Mg(BH4)2, Ca(BH4)2, etc). One application of the constrained evolutionary algorithm is given by the study of (Mg(BH4)2, which is a promising materials for hydrogen storage. Our prediction does not only reproduce the previous work on Mg(BH4)2 at ambient condition, but also yields two new tetragonal structures at high pressure, with space groups P4 and I41/acd are predicted to be lower in enthalpy, by 15.4 kJ/mol and 21.2 kJ/mol, respectively, than the earlier proposed P42nm phase. We have simulated X-ray diffraction spectra, lattice dynamics, and equations of state of these phases. The density, volume contraction, bulk modulus, and the simulated XRD patterns of P4 and I41/acd structures are in excellent agreement with the experimental results. Two kinds of oxides (Xe-O and Mg-O) have been studied under megabar pressures. For XeO, we predict the existence of thermodynamically stable Xe-O compounds at high pressures (XeO, XeO2 and XeO3 become stable at pressures of 83, 102 and 114 GPa, respectively). For Mg-O, our calculations find that two extraordinary compounds MgO2 and Mg3O 2 become thermodynamically stable at 116 GPa and 500 GPa, respectively. Our calculations indicate large charge transfer in these oxides for both systems, suggesting that large electronegativity difference and pressure are the key factors favouring their formations. We also discuss if these oxides might exist at earth and planetary conditions. If the target properties are set as the global fitness functions while structure relaxations are energy/enthalpy minimization, such hybrid optimization technique could effectively explore the landscape of properties for the given systems. Here we illustrate this function by the case of searching for superdense carbon allotropes. We find three structures (hP3, tI12, and tP12) that have significantly greater density. Furthermore, we find a collection of other superdense structures based on different ways of packing carbon tetrahedral. Superdense carbon allotropes are predicted to have remarkably high refractive indices and strong dispersion of light. Apart from evolutionary approach, there also exist some other methods for structural prediction. One can also combine the features from different methods. We develop a novel method for crystal structure prediction, based on metadynamics and evolutionary algorithms. This technique can be used to produce efficiently both the ground state and metastable states easily reachable from a reasonable initial structure. We use the cell shape as collective variable and evolutionary variation operators developed in the context of the USPEX method to equilibrate the system as a function of the collective variables. We illustrate how this approach helps one to find stable and metastable states for Al2SiO5, SiO2, MgSiO3. Apart from predicting crystal structures, the new method can also provide insight into mechanisms of phase transitions. This method is especially powerful in sampling the metastable structures from a given configuration. Experiments on cold compression indicated the existence of a new superhard carbon allotrope. Numerous metastable candidate structures featuring different topologies have been proposed for this allotrope. We use evolutionary metadynamics to systematically search for possible candidates which could be accessible from graphite. (Abstract shortened by UMI.)

  2. Prediction of binding hot spot residues by using structural and evolutionary parameters

    PubMed Central

    2009-01-01

    In this work, we present a method for predicting hot spot residues by using a set of structural and evolutionary parameters. Unlike previous studies, we use a set of parameters which do not depend on the structure of the protein in complex, so that the predictor can also be used when the interface region is unknown. Despite the fact that no information concerning proteins in complex is used for prediction, the application of the method to a compiled dataset described in the literature achieved a performance of 60.4%, as measured by F-Measure, corresponding to a recall of 78.1% and a precision of 49.5%. This result is higher than those reported by previous studies using the same data set. PMID:21637529

  3. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  4. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.

    PubMed

    Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F

    2017-08-18

    Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.

  5. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  6. Discrete Molecular Dynamics Can Predict Helical Prestructured Motifs in Disordered Proteins

    PubMed Central

    Han, Kyou-Hoon; Dokholyan, Nikolay V.; Tompa, Péter; Kalmár, Lajos; Hegedűs, Tamás

    2014-01-01

    Intrinsically disordered proteins (IDPs) lack a stable tertiary structure, but their short binding regions termed Pre-Structured Motifs (PreSMo) can form transient secondary structure elements in solution. Although disordered proteins are crucial in many biological processes and designing strategies to modulate their function is highly important, both experimental and computational tools to describe their conformational ensembles and the initial steps of folding are sparse. Here we report that discrete molecular dynamics (DMD) simulations combined with replica exchange (RX) method efficiently samples the conformational space and detects regions populating α-helical conformational states in disordered protein regions. While the available computational methods predict secondary structural propensities in IDPs based on the observation of protein-protein interactions, our ab initio method rests on physical principles of protein folding and dynamics. We show that RX-DMD predicts α-PreSMos with high confidence confirmed by comparison to experimental NMR data. Moreover, the method also can dissect α-PreSMos in close vicinity to each other and indicate helix stability. Importantly, simulations with disordered regions forming helices in X-ray structures of complexes indicate that a preformed helix is frequently the binding element itself, while in other cases it may have a role in initiating the binding process. Our results indicate that RX-DMD provides a breakthrough in the structural and dynamical characterization of disordered proteins by generating the structural ensembles of IDPs even when experimental data are not available. PMID:24763499

  7. Soft Computing Methods for Disulfide Connectivity Prediction.

    PubMed

    Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S

    2015-01-01

    The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.

  8. Protein single-model quality assessment by feature-based probability density functions.

    PubMed

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  9. Towards fully automated structure-based function prediction in structural genomics: a case study.

    PubMed

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  10. Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

    DOE PAGES

    Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

    2015-03-12

    The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less

  11. Prediction of Water Binding to Protein Hydration Sites with a Discrete, Semiexplicit Solvent Model.

    PubMed

    Setny, Piotr

    2015-12-08

    Buried water molecules are ubiquitous in protein structures and are found at the interface of most protein-ligand complexes. Determining their distribution and thermodynamic effect is a challenging yet important task, of great of practical value for the modeling of biomolecular structures and their interactions. In this study, we present a novel method aimed at the prediction of buried water molecules in protein structures and estimation of their binding free energies. It is based on a semiexplicit, discrete solvation model, which we previously introduced in the context of small molecule hydration. The method is applicable to all macromolecular structures described by a standard all-atom force field, and predicts complete solvent distribution within a single run with modest computational cost. We demonstrate that it indicates positions of buried hydration sites, including those filled by more than one water molecule, and accurately differentiates them from sterically accessible to water but void regions. The obtained estimates of water binding free energies are in fair agreement with reference results determined with the double decoupling method.

  12. An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis

    PubMed Central

    Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang

    2013-01-01

    Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234

  13. Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling.

    PubMed

    Wang, Juexin; Luttrell, Joseph; Zhang, Ning; Khan, Saad; Shi, NianQing; Wang, Michael X; Kang, Jing-Qiong; Wang, Zheng; Xu, Dong

    2016-01-01

    Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.

  14. Prediction of missing links and reconstruction of complex networks

    NASA Astrophysics Data System (ADS)

    Zhang, Cheng-Jun; Zeng, An

    2016-04-01

    Predicting missing links in complex networks is of great significance from both theoretical and practical point of view, which not only helps us understand the evolution of real systems but also relates to many applications in social, biological and online systems. In this paper, we study the features of different simple link prediction methods, revealing that they may lead to the distortion of networks’ structural and dynamical properties. Moreover, we find that high prediction accuracy is not definitely corresponding to a high performance in preserving the network properties when using link prediction methods to reconstruct networks. Our work highlights the importance of considering the feedback effect of the link prediction methods on network properties when designing the algorithms.

  15. Mass and stiffness estimation using mobile devices for structural health monitoring

    NASA Astrophysics Data System (ADS)

    Le, Viet; Yu, Tzuyang

    2015-04-01

    In the structural health monitoring (SHM) of civil infrastructure, dynamic methods using mass, damping, and stiffness for characterizing structural health have been a traditional and widely used approach. Changes in these system parameters over time indicate the progress of structural degradation or deterioration. In these methods, capability of predicting system parameters is essential to their success. In this paper, research work on the development of a dynamic SHM method based on perturbation analysis is reported. The concept is to use externally applied mass to perturb an unknown system and measure the natural frequency of the system. Derived theoretical expressions for mass and stiffness prediction are experimentally verified by a building model. Dynamic responses of the building model perturbed by various masses in free vibration were experimentally measured by a mobile device (cell phone) to extract the natural frequency of the building model. Single-degreeof- freedom (SDOF) modeling approach was adopted for the sake of using a cell phone. From the experimental result, it is shown that the percentage error of predicted mass increases when the mass ratio increases, while the percentage error of predicted stiffness decreases when the mass ratio increases. This work also demonstrated the potential use of mobile devices in the health monitoring of civil infrastructure.

  16. Predicting crystal structures and properties of matter under extreme conditions via quantum mechanics: The pressure is on

    DOE PAGES

    Zurek, Eva; Grochala, Wojciech

    2014-11-27

    Experimental studies of compressed matter are now routinely conducted at pressures exceeding 1 mln atm (100 GPa) and occasionally they even surpass 10 mln atm (1 TPa). The structure and properties of solids that have been so significantly squeezed differ considerably from those know at ambient pressures (1 atm), often times leading to new and unexpected physics. Chemical reactivity is also substantially altered in the extreme pressure regime. In this feature paper we describe how synergy between theory and experiment can pave the road towards new experimental discoveries. Because chemical rules-of-thumb established at 1 atm often fail to predict themore » structures of solids under high pressure, automated crystal structure prediction (CSP) methods have been increasingly employed. After outlining the most important CSP techniques, we showcase a few examples from the recent literature that exemplify just how useful theory can be as an aid in the interpretation of experimental data, describe exciting theoretical predictions that are guiding experiment, and discuss when the computational methods that are currently routinely employed fail. Lastly, we forecast important problems that will be targeted by theory as theoretical methods undergo rapid development, along with the simultaneous increase of computational power.« less

  17. Fast metabolite identification with Input Output Kernel Regression.

    PubMed

    Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho

    2016-06-15

    An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. celine.brouard@aalto.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  18. Fast metabolite identification with Input Output Kernel Regression

    PubMed Central

    Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho

    2016-01-01

    Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307628

  19. Structural prediction and analysis of VIH-related peptides from selected crustacean species.

    PubMed

    Nagaraju, Ganji Purna Chandra; Kumari, Nunna Siva; Prasad, Ganji Lakshmi Vara; Rajitha, Balney; Meenu, Madan; Rao, Manam Sreenivasa; Naik, Bannoth Reddya

    2009-08-17

    The tentative elucidation of the 3D-structure of vitellogenesis inhibiting hormone (VIH) peptides is conversely underprivileged by difficulties in gaining enough peptide or protein, diffracting crystals, and numerous extra technical aspects. As a result, no structural information is available for VIH peptide sequences registered in the Genbank. In this situation, it is not surprising that predictive methods have achieved great interest. Here, in this study the molt-inhibiting hormone (MIH) of the kuruma prawn (Marsupenaeus japonicus) is used, to predict the structure of four VIHrelated peptides in the crustacean species. The high similarity of the 3D-structures and the calculated physiochemical characteristics of these peptides suggest a common fold for the entire family.

  20. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    PubMed

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  1. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  2. D3R grand challenge 2015: Evaluation of protein-ligand pose and affinity predictions

    NASA Astrophysics Data System (ADS)

    Gathiaka, Symon; Liu, Shuai; Chiu, Michael; Yang, Huanwang; Stuckey, Jeanne A.; Kang, You Na; Delproposto, Jim; Kubish, Ginger; Dunbar, James B.; Carlson, Heather A.; Burley, Stephen K.; Walters, W. Patrick; Amaro, Rommie E.; Feher, Victoria A.; Gilson, Michael K.

    2016-09-01

    The Drug Design Data Resource (D3R) ran Grand Challenge 2015 between September 2015 and February 2016. Two targets served as the framework to test community docking and scoring methods: (1) HSP90, donated by AbbVie and the Community Structure Activity Resource (CSAR), and (2) MAP4K4, donated by Genentech. The challenges for both target datasets were conducted in two stages, with the first stage testing pose predictions and the capacity to rank compounds by affinity with minimal structural data; and the second stage testing methods for ranking compounds with knowledge of at least a subset of the ligand-protein poses. An additional sub-challenge provided small groups of chemically similar HSP90 compounds amenable to alchemical calculations of relative binding free energy. Unlike previous blinded Challenges, we did not provide cognate receptors or receptors prepared with hydrogens and likewise did not require a specified crystal structure to be used for pose or affinity prediction in Stage 1. Given the freedom to select from over 200 crystal structures of HSP90 in the PDB, participants employed workflows that tested not only core docking and scoring technologies, but also methods for addressing water-mediated ligand-protein interactions, binding pocket flexibility, and the optimal selection of protein structures for use in docking calculations. Nearly 40 participating groups submitted over 350 prediction sets for Grand Challenge 2015. This overview describes the datasets and the organization of the challenge components, summarizes the results across all submitted predictions, and considers broad conclusions that may be drawn from this collaborative community endeavor.

  3. D3R Grand Challenge 2015: Evaluation of Protein-Ligand Pose and Affinity Predictions

    PubMed Central

    Gathiaka, Symon; Liu, Shuai; Chiu, Michael; Yang, Huanwang; Stuckey, Jeanne A; Kang, You Na; Delproposto, Jim; Kubish, Ginger; Dunbar, James B.; Carlson, Heather A.; Burley, Stephen K.; Walters, W. Patrick; Amaro, Rommie E.; Feher, Victoria A.; Gilson, Michael K.

    2017-01-01

    The Drug Design Data Resource (D3R) ran Grand Challenge 2015 between September 2015 and February 2016. Two targets served as the framework to test community docking and scoring methods: (i) HSP90, donated by AbbVie and the Community Structure Activity Resource (CSAR), and (ii) MAP4K4, donated by Genentech. The challenges for both target datasets were conducted in two stages, with the first stage testing pose predictions and the capacity to rank compounds by affinity with minimal structural data; and the second stage testing methods for ranking compounds with knowledge of at least a subset of the ligand-protein poses. An additional sub-challenge provided small groups of chemically similar HSP90 compounds amenable to alchemical calculations of relative binding free energy. Unlike previous blinded Challenges, we did not provide cognate receptors or receptors prepared with hydrogens and likewise did not require a specified crystal structure to be used for pose or affinity prediction in Stage 1. Given the freedom to select from over 200 crystal structures of HSP90 in the PDB, participants employed workflows that tested not only core docking and scoring technologies, but also methods for addressing water-mediated ligand-protein interactions, binding pocket flexibility, and the optimal selection of protein structures for use in docking calculations. Nearly 40 participating groups submitted over 350 prediction sets for Grand Challenge 2015. This overview describes the datasets and the organization of the challenge components, summarizes the results across all submitted predictions, and considers broad conclusions that may be drawn from this collaborative community endeavor. PMID:27696240

  4. Real-time ligand binding pocket database search using local surface descriptors.

    PubMed

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-07-01

    Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.

  5. Power flow as a complement to statistical energy analysis and finite element analysis

    NASA Technical Reports Server (NTRS)

    Cuschieri, J. M.

    1987-01-01

    Present methods of analysis of the structural response and the structure-borne transmission of vibrational energy use either finite element (FE) techniques or statistical energy analysis (SEA) methods. The FE methods are a very useful tool at low frequencies where the number of resonances involved in the analysis is rather small. On the other hand SEA methods can predict with acceptable accuracy the response and energy transmission between coupled structures at relatively high frequencies where the structural modal density is high and a statistical approach is the appropriate solution. In the mid-frequency range, a relatively large number of resonances exist which make finite element method too costly. On the other hand SEA methods can only predict an average level form. In this mid-frequency range a possible alternative is to use power flow techniques, where the input and flow of vibrational energy to excited and coupled structural components can be expressed in terms of input and transfer mobilities. This power flow technique can be extended from low to high frequencies and this can be integrated with established FE models at low frequencies and SEA models at high frequencies to form a verification of the method. This method of structural analysis using power flo and mobility methods, and its integration with SEA and FE analysis is applied to the case of two thin beams joined together at right angles.

  6. Predictions of Crystal Structure Based on Radius Ratio: How Reliable Are They?

    ERIC Educational Resources Information Center

    Nathan, Lawrence C.

    1985-01-01

    Discussion of crystalline solids in undergraduate curricula often includes the use of radius ratio rules as a method for predicting which type of crystal structure is likely to be adopted by a given ionic compound. Examines this topic, establishing more definitive guidelines for the use and reliability of the rules. (JN)

  7. A Primer In Advanced Fatigue Life Prediction Methods

    NASA Technical Reports Server (NTRS)

    Halford, Gary R.

    2000-01-01

    Metal fatigue has plagued structural components for centuries, and it remains a critical durability issue in today's aerospace hardware. This is true despite vastly improved and advanced materials, increased mechanistic understanding, and development of accurate structural analysis and advanced fatigue life prediction tools. Each advance is quickly taken advantage of to produce safer, more reliable more cost effective, and better performing products. In other words, as the envelop is expanded, components are then designed to operate just as close to the newly expanded envelop as they were to the initial one. The problem is perennial. The economic importance of addressing structural durability issues early in the design process is emphasized. Tradeoffs with performance, cost, and legislated restrictions are pointed out. Several aspects of structural durability of advanced systems, advanced materials and advanced fatigue life prediction methods are presented. Specific items include the basic elements of durability analysis, conventional designs, barriers to be overcome for advanced systems, high-temperature life prediction for both creep-fatigue and thermomechanical fatigue, mean stress effects, multiaxial stress-strain states, and cumulative fatigue damage accumulation assessment.

  8. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter.

    PubMed

    Yang, Jing; Jin, Qi-Yu; Zhang, Biao; Shen, Hong-Bin

    2016-08-15

    Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Sorting protein decoys by machine-learning-to-rank

    PubMed Central

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-01-01

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset. PMID:27530967

  10. Sorting protein decoys by machine-learning-to-rank.

    PubMed

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-08-17

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.

  11. Characterising RNA secondary structure space using information entropy

    PubMed Central

    2013-01-01

    Comparative methods for RNA secondary structure prediction use evolutionary information from RNA alignments to increase prediction accuracy. The model is often described in terms of stochastic context-free grammars (SCFGs), which generate a probability distribution over secondary structures. It is, however, unclear how this probability distribution changes as a function of the input alignment. As prediction programs typically only return a single secondary structure, better characterisation of the underlying probability space of RNA secondary structures is of great interest. In this work, we show how to efficiently compute the information entropy of the probability distribution over RNA secondary structures produced for RNA alignments by a phylo-SCFG, and implement it for the PPfold model. We also discuss interpretations and applications of this quantity, including how it can clarify reasons for low prediction reliability scores. PPfold and its source code are available from http://birc.au.dk/software/ppfold/. PMID:23368905

  12. Evaluation of Health Equity Impact of Structural Policies: Overview of Research Methods Used in the SOPHIE Project.

    PubMed

    Kunst, Anton E

    2017-07-01

    This article briefly assesses the research methods that were applied in the SOPHIE project to evaluate the impact of structural policies on population health and health inequalities. The evaluation of structural policies is one of the key methodological challenges in today's public health. The experience in the SOPHIE project was that mixed methods are essential to identify, understand, and predict the health impact of structural policies. On the one hand, quantitative studies that included spatial comparisons or time trend analyses, preferably in a quasi-experimental design, showed that some structural policies were associated with improved population health and smaller health inequalities. On the other hand, qualitative studies, often inspired by realist approaches, were important to understand how these policies could have achieved the observed impact and why they would succeed in some settings but fail in others. This review ends with five recommendations for future studies that aim to evaluate, understand, and predict how health inequalities can be reduced through structural policies.

  13. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  14. An Integrated Approach Linking Process to Structural Modeling With Microstructural Characterization for Injections-Molded Long-Fiber Thermoplastics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nguyen, Ba Nghiep; Bapanapalli, Satish K.; Smith, Mark T.

    2008-09-01

    The objective of our work is to enable the optimum design of lightweight automotive structural components using injection-molded long fiber thermoplastics (LFTs). To this end, an integrated approach that links process modeling to structural analysis with experimental microstructural characterization and validation is developed. First, process models for LFTs are developed and implemented into processing codes (e.g. ORIENT, Moldflow) to predict the microstructure of the as-formed composite (i.e. fiber length and orientation distributions). In parallel, characterization and testing methods are developed to obtain necessary microstructural data to validate process modeling predictions. Second, the predicted LFT composite microstructure is imported into amore » structural finite element analysis by ABAQUS to determine the response of the as-formed composite to given boundary conditions. At this stage, constitutive models accounting for the composite microstructure are developed to predict various types of behaviors (i.e. thermoelastic, viscoelastic, elastic-plastic, damage, fatigue, and impact) of LFTs. Experimental methods are also developed to determine material parameters and to validate constitutive models. Such a process-linked-structural modeling approach allows an LFT composite structure to be designed with confidence through numerical simulations. Some recent results of our collaborative research will be illustrated to show the usefulness and applications of this integrated approach.« less

  15. HART-II Acoustic Predictions using a Coupled CFD/CSD Method

    NASA Technical Reports Server (NTRS)

    Boyd, D. Douglas, Jr.

    2009-01-01

    This paper documents results to date from the Rotorcraft Acoustic Characterization and Mitigation activity under the NASA Subsonic Rotary Wing Project. The primary goal of this activity is to develop a NASA rotorcraft impulsive noise prediction capability which uses first principles fluid dynamics and structural dynamics. During this effort, elastic blade motion and co-processing capabilities have been included in a recent version of the computational fluid dynamics code (CFD). The CFD code is loosely coupled to computational structural dynamics (CSD) code using new interface codes. The CFD/CSD coupled solution is then used to compute impulsive noise on a plane under the rotor using the Ffowcs Williams-Hawkings solver. This code system is then applied to a range of cases from the Higher Harmonic Aeroacoustic Rotor Test II (HART-II) experiment. For all cases presented, the full experimental configuration (i.e., rotor and wind tunnel sting mount) are used in the coupled CFD/CSD solutions. Results show good correlation between measured and predicted loading and loading time derivative at the only measured radial station. A contributing factor for a typically seen loading mean-value offset between measured data and predictions data is examined. Impulsive noise predictions on the measured microphone plane under the rotor compare favorably with measured mid-frequency noise for all cases. Flow visualization of the BL and MN cases shows that vortex structures generated in the prediction method are consist with measurements. Future application of the prediction method is discussed.

  16. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes.

    PubMed

    Jespersen, Martin Closter; Peters, Bjoern; Nielsen, Morten; Marcatili, Paolo

    2017-07-03

    Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Assessment of quantitative structure-activity relationship of toxicity prediction models for Korean chemical substance control legislation

    PubMed Central

    Kim, Kwang-Yon; Shin, Seong Eun; No, Kyoung Tai

    2015-01-01

    Objectives For successful adoption of legislation controlling registration and assessment of chemical substances, it is important to obtain sufficient toxicological experimental evidence and other related information. It is also essential to obtain a sufficient number of predicted risk and toxicity results. Particularly, methods used in predicting toxicities of chemical substances during acquisition of required data, ultimately become an economic method for future dealings with new substances. Although the need for such methods is gradually increasing, the-required information about reliability and applicability range has not been systematically provided. Methods There are various representative environmental and human toxicity models based on quantitative structure-activity relationships (QSAR). Here, we secured the 10 representative QSAR-based prediction models and its information that can make predictions about substances that are expected to be regulated. We used models that predict and confirm usability of the information expected to be collected and submitted according to the legislation. After collecting and evaluating each predictive model and relevant data, we prepared methods quantifying the scientific validity and reliability, which are essential conditions for using predictive models. Results We calculated predicted values for the models. Furthermore, we deduced and compared adequacies of the models using the Alternative non-testing method assessed for Registration, Evaluation, Authorization, and Restriction of Chemicals Substances scoring system, and deduced the applicability domains for each model. Additionally, we calculated and compared inclusion rates of substances expected to be regulated, to confirm the applicability. Conclusions We evaluated and compared the data, adequacy, and applicability of our selected QSAR-based toxicity prediction models, and included them in a database. Based on this data, we aimed to construct a system that can be used with predicted toxicity results. Furthermore, by presenting the suitability of individual predicted results, we aimed to provide a foundation that could be used in actual assessments and regulations. PMID:26206368

  18. Ensemble Generation and the Influence of Protein Flexibility on Geometric Tunnel Prediction in Cytochrome P450 Enzymes

    PubMed Central

    Kingsley, Laura J.; Lill, Markus A.

    2014-01-01

    Computational prediction of ligand entry and egress paths in proteins has become an emerging topic in computational biology and has proven useful in fields such as protein engineering and drug design. Geometric tunnel prediction programs, such as Caver3.0 and MolAxis, are computationally efficient methods to identify potential ligand entry and egress routes in proteins. Although many geometric tunnel programs are designed to accommodate a single input structure, the increasingly recognized importance of protein flexibility in tunnel formation and behavior has led to the more widespread use of protein ensembles in tunnel prediction. However, there has not yet been an attempt to directly investigate the influence of ensemble size and composition on geometric tunnel prediction. In this study, we compared tunnels found in a single crystal structure to ensembles of various sizes generated using different methods on both the apo and holo forms of cytochrome P450 enzymes CYP119, CYP2C9, and CYP3A4. Several protein structure clustering methods were tested in an attempt to generate smaller ensembles that were capable of reproducing the data from larger ensembles. Ultimately, we found that by including members from both the apo and holo data sets, we could produce ensembles containing less than 15 members that were comparable to apo or holo ensembles containing over 100 members. Furthermore, we found that, in the absence of either apo or holo crystal structure data, pseudo-apo or –holo ensembles (e.g. adding ligand to apo protein throughout MD simulations) could be used to resemble the structural ensembles of the corresponding apo and holo ensembles, respectively. Our findings not only further highlight the importance of including protein flexibility in geometric tunnel prediction, but also suggest that smaller ensembles can be as capable as larger ensembles at capturing many of the protein motions important for tunnel prediction at a lower computational cost. PMID:24956479

  19. D3R Grand Challenge 2: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies

    NASA Astrophysics Data System (ADS)

    Gaieb, Zied; Liu, Shuai; Gathiaka, Symon; Chiu, Michael; Yang, Huanwang; Shao, Chenghua; Feher, Victoria A.; Walters, W. Patrick; Kuhn, Bernd; Rudolph, Markus G.; Burley, Stephen K.; Gilson, Michael K.; Amaro, Rommie E.

    2018-01-01

    The Drug Design Data Resource (D3R) ran Grand Challenge 2 (GC2) from September 2016 through February 2017. This challenge was based on a dataset of structures and affinities for the nuclear receptor farnesoid X receptor (FXR), contributed by F. Hoffmann-La Roche. The dataset contained 102 IC50 values, spanning six orders of magnitude, and 36 high-resolution co-crystal structures with representatives of four major ligand classes. Strong global participation was evident, with 49 participants submitting 262 prediction submission packages in total. Procedurally, GC2 mimicked Grand Challenge 2015 (GC2015), with a Stage 1 subchallenge testing ligand pose prediction methods and ranking and scoring methods, and a Stage 2 subchallenge testing only ligand ranking and scoring methods after the release of all blinded co-crystal structures. Two smaller curated sets of 18 and 15 ligands were developed to test alchemical free energy methods. This overview summarizes all aspects of GC2, including the dataset details, challenge procedures, and participant results. We also consider implications for progress in the field, while highlighting methodological areas that merit continued development. Similar to GC2015, the outcome of GC2 underscores the pressing need for methods development in pose prediction, particularly for ligand scaffolds not currently represented in the Protein Data Bank (http://www.pdb.org), and in affinity ranking and scoring of bound ligands.

  20. Improving consensus contact prediction via server correlation reduction.

    PubMed

    Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

    2009-05-06

    Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

  1. Dispersion Corrected Structural Properties and Quasiparticle Band Gaps of Several Organic Energetic Solids.

    PubMed

    Appalakondaiah, S; Vaitheeswaran, G; Lebègue, S

    2015-06-18

    We have performed ab initio calculations for a series of energetic solids to explore their structural and electronic properties. To evaluate the ground state volume of these molecular solids, different dispersion correction methods were accounted in DFT, namely the Tkatchenko-Scheffler method (with and without self-consistent screening), Grimme's methods (D2, D3(BJ)), and the vdW-DF method. Our results reveal that dispersion correction methods are essential in understanding these complex structures with van der Waals interactions and hydrogen bonding. The calculated ground state volumes and bulk moduli show that the performance of each method is not unique, and therefore a careful examination is mandatory for interpreting theoretical predictions. This work also emphasizes the importance of quasiparticle calculations in predicting the band gap, which is obtained here with the GW approximation. We find that the obtained band gaps are ranging from 4 to 7 eV for the different compounds, indicating their insulating nature. In addition, we show the essential role of quasiparticle band structure calculations to correlate the gap with the energetic properties.

  2. Puzzle of magnetic moments of Ni clusters revisited using quantum Monte Carlo method.

    PubMed

    Lee, Hung-Wen; Chang, Chun-Ming; Hsing, Cheng-Rong

    2017-02-28

    The puzzle of the magnetic moments of small nickel clusters arises from the discrepancy between values predicted using density functional theory (DFT) and experimental measurements. Traditional DFT approaches underestimate the magnetic moments of nickel clusters. Two fundamental problems are associated with this puzzle, namely, calculating the exchange-correlation interaction accurately and determining the global minimum structures of the clusters. Theoretically, the two problems can be solved using quantum Monte Carlo (QMC) calculations and the ab initio random structure searching (AIRSS) method correspondingly. Therefore, we combined the fixed-moment AIRSS and QMC methods to investigate the magnetic properties of Ni n (n = 5-9) clusters. The spin moments of the diffusion Monte Carlo (DMC) ground states are higher than those of the Perdew-Burke-Ernzerhof ground states and, in the case of Ni 8-9 , two new ground-state structures have been discovered using the DMC calculations. The predicted results are closer to the experimental findings, unlike the results predicted in previous standard DFT studies.

  3. QSAR Methods.

    PubMed

    Gini, Giuseppina

    2016-01-01

    In this chapter, we introduce the basis of computational chemistry and discuss how computational methods have been extended to some biological properties and toxicology, in particular. Since about 20 years, chemical experimentation is more and more replaced by modeling and virtual experimentation, using a large core of mathematics, chemistry, physics, and algorithms. Then we see how animal experiments, aimed at providing a standardized result about a biological property, can be mimicked by new in silico methods. Our emphasis here is on toxicology and on predicting properties through chemical structures. Two main streams of such models are available: models that consider the whole molecular structure to predict a value, namely QSAR (Quantitative Structure Activity Relationships), and models that find relevant substructures to predict a class, namely SAR. The term in silico discovery is applied to chemical design, to computational toxicology, and to drug discovery. We discuss how the experimental practice in biological science is moving more and more toward modeling and simulation. Such virtual experiments confirm hypotheses, provide data for regulation, and help in designing new chemicals.

  4. Innovative FRF measurement technique for frequency based substructuring method

    NASA Astrophysics Data System (ADS)

    Mirza, W. I. I. Wan Iskandar; Rani, M. N. Abdul; Ayub, M. A.; Yunus, M. A.; Omar, R.; Mohd Zin, M. S.

    2018-04-01

    In this paper, frequency based substructuring (FBS) is used in an attempt to predict the dynamic behaviour of an assembled structure. The assembled structure which consists of two beam substructures namely substructure A (finite element model) and substructure B (experimental model) was tested. The FE model of substructure A was constructed by using 3D elements and the Frequency Response Functions (FRFs) were derived viaa FRF synthesis method. A specially customised bolt was used to allow the attachment of sensors and excitation to be made at theinterfaces of substructure B, and the FRFs were measured by using an impact testing method. Both substructures A and B were then coupled by using the FBS method for the prediction of FRFs. The coupled FRF obtained was validated with the measured FRF counterparts. This work revealed that by implementing a specially customised bolt during the measurement of FRF at the interface, led to an improvement in the FBS predicted results.

  5. Method of identifying hairpin DNA probes by partial fold analysis

    DOEpatents

    Miller, Benjamin L [Penfield, NY; Strohsahl, Christopher M [Saugerties, NY

    2009-10-06

    Method of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.

  6. Method of identifying hairpin DNA probes by partial fold analysis

    DOEpatents

    Miller, Benjamin L.; Strohsahl, Christopher M.

    2008-10-28

    Methods of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.

  7. Simplified Model to Predict Deflection and Natural Frequency of Steel Pole Structures

    NASA Astrophysics Data System (ADS)

    Balagopal, R.; Prasad Rao, N.; Rokade, R. P.

    2018-04-01

    Steel pole structures are suitable alternate to transmission line towers, due to difficulty encountered in finding land for the new right of way for installation of new lattice towers. The steel poles have tapered cross section and they are generally used for communication, power transmission and lighting purposes. Determination of deflection of steel pole is important to decide its functionality requirement. The excessive deflection of pole may affect the signal attenuation and short circuiting problems in communication/transmission poles. In this paper, a simplified method is proposed to determine both primary and secondary deflection based on dummy unit load/moment method. The predicted deflection from proposed method is validated with full scale experimental investigation conducted on 8 m and 30 m high lighting mast, 132 and 400 kV transmission pole and found to be in close agreement with each other. Determination of natural frequency is an important criterion to examine its dynamic sensitivity. A simplified semi-empirical method using the static deflection from the proposed method is formulated to determine its natural frequency. The natural frequency predicted from proposed method is validated with FE analysis results. Further the predicted results are validated with experimental results available in literature.

  8. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    PubMed Central

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-01-01

    Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145

  9. Protein structure modeling and refinement by global optimization in CASP12.

    PubMed

    Hong, Seung Hwan; Joung, InSuk; Flores-Canales, Jose C; Manavalan, Balachandran; Cheng, Qianyi; Heo, Seungryong; Kim, Jong Yun; Lee, Sun Young; Nam, Mikyung; Joo, Keehyoung; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

    2018-03-01

    For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded. © 2017 Wiley Periodicals, Inc.

  10. Protein structure refinement using a quantum mechanics-based chemical shielding predictor† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c6sc04344e Click here for additional data file.

    PubMed Central

    2017-01-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325

  11. A novel structure-based multimode QSAR method affords predictive models for phosphodiesterase inhibitors.

    PubMed

    Dong, Xialan; Ebalunode, Jerry O; Cho, Sung Jin; Zheng, Weifan

    2010-02-22

    Quantitative structure-activity relationship (QSAR) methods aim to build quantitatively predictive models for the discovery of new molecules. It has been widely used in medicinal chemistry for drug discovery. Many QSAR techniques have been developed since Hansch's seminal work, and more are still being developed. Motivated by Hopfinger's receptor-dependent QSAR (RD-QSAR) formalism and the Lukacova-Balaz scheme to treat multimode issues, we have initiated studies that focus on a structure-based multimode QSAR (SBMM QSAR) method, where the structure of the target protein is used in characterizing the ligand, and the multimode issue of ligand binding is systematically treated with a modified Lukacova-Balaz scheme. All ligand molecules are first docked to the target binding pocket to obtain a set of aligned ligand poses. A structure-based pharmacophore concept is adopted to characterize the binding pocket. Specifically, we represent the binding pocket as a geometric grid labeled by pharmacophoric features. Each pose of the ligand is also represented as a labeled grid, where each grid point is labeled according to the atom types of nearby ligand atoms. These labeled grids or three-dimensional (3D) maps (both the receptor map (R-map) and the ligand map (L-map)) are compared to each other to derive descriptors for each pose of the ligand, resulting in a multimode structure-activity relationship (SAR) table. Iterative partial least-squares (PLS) is employed to build the QSAR models. When we applied this method to analyze PDE-4 inhibitors, predictive models have been developed, obtaining models with excellent training correlation (r(2) = 0.65-0.66), as well as test correlation (R(2) = 0.64-0.65). A comparative analysis with 4 other QSAR techniques demonstrates that this new method affords better models, in terms of the prediction power for the test set.

  12. Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics.

    PubMed

    Kumar, Avishek; Campitelli, Paul; Thorpe, M F; Ozkan, S Banu

    2015-12-01

    The most successful protein structure prediction methods to date have been template-based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug-design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr-REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native-like structures from a template and to provide a set of persistent contacts to be employed during re-folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. © 2015 Wiley Periodicals, Inc.

  13. Automated use of mutagenesis data in structure prediction.

    PubMed

    Nanda, Vikas; DeGrado, William F

    2005-05-15

    In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.

  14. PHOENIX: a scoring function for affinity prediction derived using high-resolution crystal structures and calorimetry measurements.

    PubMed

    Tang, Yat T; Marshall, Garland R

    2011-02-28

    Binding affinity prediction is one of the most critical components to computer-aided structure-based drug design. Despite advances in first-principle methods for predicting binding affinity, empirical scoring functions that are fast and only relatively accurate are still widely used in structure-based drug design. With the increasing availability of X-ray crystallographic structures in the Protein Data Bank and continuing application of biophysical methods such as isothermal titration calorimetry to measure thermodynamic parameters contributing to binding free energy, sufficient experimental data exists that scoring functions can now be derived by separating enthalpic (ΔH) and entropic (TΔS) contributions to binding free energy (ΔG). PHOENIX, a scoring function to predict binding affinities of protein-ligand complexes, utilizes the increasing availability of experimental data to improve binding affinity predictions by the following: model training and testing using high-resolution crystallographic data to minimize structural noise, independent models of enthalpic and entropic contributions fitted to thermodynamic parameters assumed to be thermodynamically biased to calculate binding free energy, use of shape and volume descriptors to better capture entropic contributions. A set of 42 descriptors and 112 protein-ligand complexes were used to derive functions using partial least-squares for change of enthalpy (ΔH) and change of entropy (TΔS) to calculate change of binding free energy (ΔG), resulting in a predictive r2 (r(pred)2) of 0.55 and a standard error (SE) of 1.34 kcal/mol. External validation using the 2009 version of the PDBbind "refined set" (n = 1612) resulted in a Pearson correlation coefficient (R(p)) of 0.575 and a mean error (ME) of 1.41 pK(d). Enthalpy and entropy predictions were of limited accuracy individually. However, their difference resulted in a relatively accurate binding free energy. While the development of an accurate and applicable scoring function was an objective of this study, the main focus was evaluation of the use of high-resolution X-ray crystal structures with high-quality thermodynamic parameters from isothermal titration calorimetry for scoring function development. With the increasing application of structure-based methods in molecular design, this study suggests that using high-resolution crystal structures, separating enthalpy and entropy contributions to binding free energy, and including descriptors to better capture entropic contributions may prove to be effective strategies toward rapid and accurate calculation of binding affinity.

  15. StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine.

    PubMed

    Zhou, Wengang; Dickerson, Julie A

    2012-01-01

    Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

  16. Prediction of physical protein protein interactions

    NASA Astrophysics Data System (ADS)

    Szilágyi, András; Grimm, Vera; Arakaki, Adrián K.; Skolnick, Jeffrey

    2005-06-01

    Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.

  17. Predicting the activity of drugs for a group of imidazopyridine anticoccidial compounds.

    PubMed

    Si, Hongzong; Lian, Ning; Yuan, Shuping; Fu, Aiping; Duan, Yun-Bo; Zhang, Kejun; Yao, Xiaojun

    2009-10-01

    Gene expression programming (GEP) is a novel machine learning technique. The GEP is used to build nonlinear quantitative structure-activity relationship model for the prediction of the IC(50) for the imidazopyridine anticoccidial compounds. This model is based on descriptors which are calculated from the molecular structure. Four descriptors are selected from the descriptors' pool by heuristic method (HM) to build multivariable linear model. The GEP method produced a nonlinear quantitative model with a correlation coefficient and a mean error of 0.96 and 0.24 for the training set, 0.91 and 0.52 for the test set, respectively. It is shown that the GEP predicted results are in good agreement with experimental ones.

  18. Influence of the Spatial Dimensions of Ultrasonic Transducers on the Frequency Spectrum of Guided Waves.

    PubMed

    Samaitis, Vykintas; Mažeika, Liudas

    2017-08-08

    Ultrasonic guided wave (UGW)-based condition monitoring has shown great promise in detecting, localizing, and characterizing damage in complex systems. However, the application of guided waves for damage detection is challenging due to the existence of multiple modes and dispersion. This results in distorted wave packets with limited resolution and the interference of multiple reflected modes. To develop reliable inspection systems, either the transducers have to be optimized to generate a desired single mode of guided waves with known dispersive properties, or the frequency responses of all modes present in the structure must be known to predict wave interaction. Currently, there is a lack of methods to predict the response spectrum of guided wave modes, especially in cases when multiple modes are being excited simultaneously. Such methods are of vital importance for further understanding wave propagation within the structures as well as wave-damage interaction. In this study, a novel method to predict the response spectrum of guided wave modes was proposed based on Fourier analysis of the particle velocity distribution on the excitation area. The method proposed in this study estimates an excitability function based on the spatial dimensions of the transducer, type of vibration, and dispersive properties of the medium. As a result, the response amplitude as a function of frequency for each guided wave mode present in the structure can be separately obtained. The method was validated with numerical simulations on the aluminum and glass fiber composite samples. The key findings showed that it can be applied to estimate the response spectrum of a guided wave mode on any type of material (either isotropic structures, or multi layered anisotropic composites) and under any type of excitation if the phase velocity dispersion curve and the particle velocity distribution of the wave source was known initially. Thus, the proposed method may be a beneficial tool to explain and predict the response spectrum of guided waves throughout the development of any structural health monitoring system.

  19. Influence of the Spatial Dimensions of Ultrasonic Transducers on the Frequency Spectrum of Guided Waves

    PubMed Central

    Samaitis, Vykintas; Mažeika, Liudas

    2017-01-01

    Ultrasonic guided wave (UGW)-based condition monitoring has shown great promise in detecting, localizing, and characterizing damage in complex systems. However, the application of guided waves for damage detection is challenging due to the existence of multiple modes and dispersion. This results in distorted wave packets with limited resolution and the interference of multiple reflected modes. To develop reliable inspection systems, either the transducers have to be optimized to generate a desired single mode of guided waves with known dispersive properties, or the frequency responses of all modes present in the structure must be known to predict wave interaction. Currently, there is a lack of methods to predict the response spectrum of guided wave modes, especially in cases when multiple modes are being excited simultaneously. Such methods are of vital importance for further understanding wave propagation within the structures as well as wave-damage interaction. In this study, a novel method to predict the response spectrum of guided wave modes was proposed based on Fourier analysis of the particle velocity distribution on the excitation area. The method proposed in this study estimates an excitability function based on the spatial dimensions of the transducer, type of vibration, and dispersive properties of the medium. As a result, the response amplitude as a function of frequency for each guided wave mode present in the structure can be separately obtained. The method was validated with numerical simulations on the aluminum and glass fiber composite samples. The key findings showed that it can be applied to estimate the response spectrum of a guided wave mode on any type of material (either isotropic structures, or multi layered anisotropic composites) and under any type of excitation if the phase velocity dispersion curve and the particle velocity distribution of the wave source was known initially. Thus, the proposed method may be a beneficial tool to explain and predict the response spectrum of guided waves throughout the development of any structural health monitoring system. PMID:28786924

  20. R-chie: a web server and R package for visualizing RNA secondary structures

    PubMed Central

    Lai, Daniel; Proctor, Jeff R.; Zhu, Jing Yun A.; Meyer, Irmtraud M.

    2012-01-01

    Visually examining RNA structures can greatly aid in understanding their potential functional roles and in evaluating the performance of structure prediction algorithms. As many functional roles of RNA structures can already be studied given the secondary structure of the RNA, various methods have been devised for visualizing RNA secondary structures. Most of these methods depict a given RNA secondary structure as a planar graph consisting of base-paired stems interconnected by roundish loops. In this article, we present an alternative method of depicting RNA secondary structure as arc diagrams. This is well suited for structures that are difficult or impossible to represent as planar stem-loop diagrams. Arc diagrams can intuitively display pseudo-knotted structures, as well as transient and alternative structural features. In addition, they facilitate the comparison of known and predicted RNA secondary structures. An added benefit is that structure information can be displayed in conjunction with a corresponding multiple sequence alignments, thereby highlighting structure and primary sequence conservation and variation. We have implemented the visualization algorithm as a web server R-chie as well as a corresponding R package called R4RNA, which allows users to run the software locally and across a range of common operating systems. PMID:22434875

  1. The expanding universe of thiolated gold nanoclusters and beyond.

    PubMed

    Jiang, De-en

    2013-08-21

    Thiolated gold nanoclusters form a universe of their own. Researchers in this field are constantly pushing the boundary of this universe by identifying new compositions and in a few "lucky" cases, solving their structures. Such solved structures, even if there are only few, provide important hints for predicting the many identified compositions that are yet to be crystallized or structure determined. Structure prediction is the most pressing issue for a computational chemist in this field. The success of the density functional theory method in gauging the energetic ordering of isomers for thiolated gold clusters has been truly remarkable, but to predict the most stable structure for a given composition remains a great challenge. In this feature article from a computational chemist's point of view, the author shows how one understands and predicts structures for thiolated gold nanoclusters based on his old and new results. To further entertain the reader, the author also offers several "imaginative" structures, claims, and challenges for this field.

  2. Protein Structure Prediction with Evolutionary Algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  3. A first principles prediction of the crystal structure of C6Br2ClFH2

    NASA Astrophysics Data System (ADS)

    Misquitta, Alston J.; Welch, Gareth W. A.; Stone, Anthony J.; Price, Sarah L.

    2008-04-01

    We have constructed an intermolecular potential for the 1,3-dibromo-2-chloro-5-fluorobenzene molecule from first principles using SAPT(DFT) interaction energy calculations and the Williams-Stone-Misquitta method for obtaining molecular properties in distributed form. This molecule was included in the fourth Blind Test of crystal structure prediction organised by the Cambridge Crystallographic Data Centre. Using our potential, we have predicted the crystal structure of CBrClFH and found the lowest energy solution to be in excellent agreement with the experimentally observed crystal when it was subsequently revealed.

  4. Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach.

    PubMed

    Nagasundaram, N; Priya Doss, C George

    2011-01-01

    Distinguishing the deleterious from the massive number of non-functional nsSNPs that occur within a single genome is a considerable challenge in mutation research. In this approach, we have used the existing in silico methods to explore the mutation-structure-function relationship in the XPAgene. We used the Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), I-Mutant 2.0, and the Protein Analysis THrough Evolutionary Relationships methods to predict the effects of deleterious nsSNPs on protein function and evaluated the impact of mutation on protein stability by Molecular Dynamics simulations. By comparing the scores of all the four in silico methods, nsSNP with an ID rs104894131 at position C108F was predicted to be highly deleterious. We extended our Molecular dynamics approach to gain insight into the impact of this non-synonymous polymorphism on structural changes that may affect the activity of the XPAgene. Based on the in silico methods score, potential energy, root-mean-square deviation, and root-mean-square fluctuation, we predict that deleterious nsSNP at position C108F would play a significant role in causing disease by the XPA gene. Our approach would present the application of in silicotools in understanding the functional variation from the perspective of structure, evolution, and phenotype.

  5. A parallel strategy for predicting the secondary structure of polycistronic microRNAs.

    PubMed

    Han, Dianwei; Tang, Guiliang; Zhang, Jun

    2013-01-01

    The biogenesis of a functional microRNA is largely dependent on the secondary structure of the microRNA precursor (pre-miRNA). Recently, it has been shown that microRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design efficient computational methods to predict such structures for microRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. We conducted some experiments to verify the effectiveness of our parallel algorithm. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs.

  6. Structural prediction and analysis of VIH-related peptides from selected crustacean species

    PubMed Central

    Nagaraju, Ganji Purna Chandra; Kumari, Nunna Siva; Prasad, Ganji Lakshmi Vara; Rajitha, Balney; Meenu, Madan; Rao, Manam Sreenivasa; Naik, Bannoth Reddya

    2009-01-01

    The tentative elucidation of the 3D-structure of vitellogenesis inhibiting hormone (VIH) peptides is conversely underprivileged by difficulties in gaining enough peptide or protein, diffracting crystals, and numerous extra technical aspects. As a result, no structural information is available for VIH peptide sequences registered in the Genbank. In this situation, it is not surprising that predictive methods have achieved great interest. Here, in this study the molt-inhibiting hormone (MIH) of the kuruma prawn (Marsupenaeus japonicus) is used, to predict the structure of four VIHrelated peptides in the crustacean species. The high similarity of the 3D-structures and the calculated physiochemical characteristics of these peptides suggest a common fold for the entire family. PMID:20011146

  7. Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins

    NASA Astrophysics Data System (ADS)

    Basu, Sankar; Söderquist, Fredrik; Wallner, Björn

    2017-05-01

    The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents `Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible `disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.

  8. Structural testing for static failure, flutter and other scary things

    NASA Technical Reports Server (NTRS)

    Ricketts, R. H.

    1983-01-01

    Ground test and flight test methods are described that may be used to highlight potential structural problems that occur on aircraft. Primary interest is focused on light-weight general aviation airplanes. The structural problems described include static strength failure, aileron reversal, static divergence, and flutter. An example of each of the problems is discussed to illustrate how the data acquired during the tests may be used to predict the occurrence of the structural problem. While some rules of thumb for the prediction of structural problems are given the report is not intended to be used explicitly as a structural analysis handbook.

  9. Towards cheminformatics-based estimation of drug therapeutic index: Predicting the protective index of anticonvulsants using a new quantitative structure-index relationship approach.

    PubMed

    Chen, Shangying; Zhang, Peng; Liu, Xin; Qin, Chu; Tao, Lin; Zhang, Cheng; Yang, Sheng Yong; Chen, Yu Zong; Chui, Wai Keung

    2016-06-01

    The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. Copyright © 2016. Published by Elsevier Inc.

  10. Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach

    PubMed Central

    NagaSundaram, N; Priya Doss, C George

    2011-01-01

    Background: Distinguishing the deleterious from the massive number of non-functional nsSNPs that occur within a single genome is a considerable challenge in mutation research. In this approach, we have used the existing in silico methods to explore the mutation-structure-function relationship in the XPAgene. Materials and Methods: We used the Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), I-Mutant 2.0, and the Protein Analysis THrough Evolutionary Relationships methods to predict the effects of deleterious nsSNPs on protein function and evaluated the impact of mutation on protein stability by Molecular Dynamics simulations. Results: By comparing the scores of all the four in silico methods, nsSNP with an ID rs104894131 at position C108F was predicted to be highly deleterious. We extended our Molecular dynamics approach to gain insight into the impact of this non-synonymous polymorphism on structural changes that may affect the activity of the XPAgene. Conclusion: Based on the in silico methods score, potential energy, root-mean-square deviation, and root-mean-square fluctuation, we predict that deleterious nsSNP at position C108F would play a significant role in causing disease by the XPA gene. Our approach would present the application of in silicotools in understanding the functional variation from the perspective of structure, evolution, and phenotype. PMID:22190868

  11. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  12. Molecular Docking for Prediction and Interpretation of Adverse Drug Reactions.

    PubMed

    Luo, Heng; Fokoue-Nkoutche, Achille; Singh, Nalini; Yang, Lun; Hu, Jianying; Zhang, Ping

    2018-05-23

    Adverse drug reactions (ADRs) present a major burden for patients and the healthcare industry. Various computational methods have been developed to predict ADRs for drug molecules. However, many of these methods require experimental or surveillance data and cannot be used when only structural information is available. We collected 1,231 small molecule drugs and 600 human proteins and utilized molecular docking to generate binding features among them. We developed machine learning models that use these docking features to make predictions for 1,533 ADRs. These models obtain an overall area under the receiver operating characteristic curve (AUROC) of 0.843 and an overall area under the precision-recall curve (AUPR) of 0.395, outperforming seven structural fingerprint-based prediction models. Using the method, we predicted skin striae for fluticasone propionate, dermatitis acneiform for mometasone, and decreased libido for irinotecan, as demonstrations. Furthermore, we analyzed the top binding proteins associated with some of the ADRs, which can help to understand and/or generate hypotheses for underlying mechanisms of ADRs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    PubMed

    Smith, Colin A; Kortemme, Tanja

    2011-01-01

    Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  14. Performance of multiple docking and refinement methods in the pose prediction D3R prospective Grand Challenge 2016

    NASA Astrophysics Data System (ADS)

    Fradera, Xavier; Verras, Andreas; Hu, Yuan; Wang, Deping; Wang, Hongwu; Fells, James I.; Armacost, Kira A.; Crespo, Alejandro; Sherborne, Brad; Wang, Huijun; Peng, Zhengwei; Gao, Ying-Duo

    2018-01-01

    We describe the performance of multiple pose prediction methods for the D3R 2016 Grand Challenge. The pose prediction challenge includes 36 ligands, which represent 4 chemotypes and some miscellaneous structures against the FXR ligand binding domain. In this study we use a mix of fully automated methods as well as human-guided methods with considerations of both the challenge data and publicly available data. The methods include ensemble docking, colony entropy pose prediction, target selection by molecular similarity, molecular dynamics guided pose refinement, and pose selection by visual inspection. We evaluated the success of our predictions by method, chemotype, and relevance of publicly available data. For the overall data set, ensemble docking, visual inspection, and molecular dynamics guided pose prediction performed the best with overall mean RMSDs of 2.4, 2.2, and 2.2 Å respectively. For several individual challenge molecules, the best performing method is evaluated in light of that particular ligand. We also describe the protein, ligand, and public information data preparations that are typical of our binding mode prediction workflow.

  15. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    PubMed

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Computational structural mechanics for engine structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1988-01-01

    The computational structural mechanics (CSM) program at Lewis encompasses the formulation and solution of structural mechanics problems and the development of integrated software systems to computationally simulate the performance, durability, and life of engine structures. It is structured to supplement, complement, and, whenever possible, replace costly experimental efforts. Specific objectives are to investigate unique advantages of parallel and multiprocessing for reformulating and solving structural mechanics and formulating and solving multidisciplinary mechanics and to develop integrated structural system computational simulators for predicting structural performance, evaluating newly developed methods, and identifying and prioritizing improved or missing methods.

  17. Computational structural mechanics for engine structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1989-01-01

    The computational structural mechanics (CSM) program at Lewis encompasses the formulation and solution of structural mechanics problems and the development of integrated software systems to computationally simulate the performance, durability, and life of engine structures. It is structured to supplement, complement, and, whenever possible, replace costly experimental efforts. Specific objectives are to investigate unique advantages of parallel and multiprocessing for reformulating and solving structural mechanics and formulating and solving multidisciplinary mechanics and to develop integrated structural system computational simulators for predicting structural performance, evaluating newly developed methods, and identifying and prioritizing improved or missing methods.

  18. Advanced Computational Methods for High-accuracy Refinement of Protein Low-quality Models

    NASA Astrophysics Data System (ADS)

    Zang, Tianwu

    Predicting the 3-dimentional structure of protein has been a major interest in the modern computational biology. While lots of successful methods can generate models with 3˜5A root-mean-square deviation (RMSD) from the solution, the progress of refining these models is quite slow. It is therefore urgently needed to develop effective methods to bring low-quality models to higher-accuracy ranges (e.g., less than 2 A RMSD). In this thesis, I present several novel computational methods to address the high-accuracy refinement problem. First, an enhanced sampling method, named parallel continuous simulated tempering (PCST), is developed to accelerate the molecular dynamics (MD) simulation. Second, two energy biasing methods, Structure-Based Model (SBM) and Ensemble-Based Model (EBM), are introduced to perform targeted sampling around important conformations. Third, a three-step method is developed to blindly select high-quality models along the MD simulation. These methods work together to make significant refinement of low-quality models without any knowledge of the solution. The effectiveness of these methods is examined in different applications. Using the PCST-SBM method, models with higher global distance test scores (GDT_TS) are generated and selected in the MD simulation of 18 targets from the refinement category of the 10th Critical Assessment of Structure Prediction (CASP10). In addition, in the refinement test of two CASP10 targets using the PCST-EBM method, it is indicated that EBM may bring the initial model to even higher-quality levels. Furthermore, a multi-round refinement protocol of PCST-SBM improves the model quality of a protein to the level that is sufficient high for the molecular replacement in X-ray crystallography. Our results justify the crucial position of enhanced sampling in the protein structure prediction and demonstrate that a considerable improvement of low-accuracy structures is still achievable with current force fields.

  19. Drug Repositioning by Kernel-Based Integration of Molecular Structure, Molecular Activity, and Phenotype Data

    PubMed Central

    Wang, Yongcui; Chen, Shilong; Deng, Naiyang; Wang, Yong

    2013-01-01

    Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems. PMID:24244318

  20. Interactive-predictive detection of handwritten text blocks

    NASA Astrophysics Data System (ADS)

    Ramos Terrades, O.; Serrano, N.; Gordó, A.; Valveny, E.; Juan, A.

    2010-01-01

    A method for text block detection is introduced for old handwritten documents. The proposed method takes advantage of sequential book structure, taking into account layout information from pages previously transcribed. This glance at the past is used to predict the position of text blocks in the current page with the help of conventional layout analysis methods. The method is integrated into the GIDOC prototype: a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. Results are given in a transcription task on a 764-page Spanish manuscript from 1891.

  1. Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

    NASA Astrophysics Data System (ADS)

    Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

    2018-03-01

    Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.

  2. Recent developments of the NESSUS probabilistic structural analysis computer program

    NASA Technical Reports Server (NTRS)

    Millwater, H.; Wu, Y.-T.; Torng, T.; Thacker, B.; Riha, D.; Leung, C. P.

    1992-01-01

    The NESSUS probabilistic structural analysis computer program combines state-of-the-art probabilistic algorithms with general purpose structural analysis methods to compute the probabilistic response and the reliability of engineering structures. Uncertainty in loading, material properties, geometry, boundary conditions and initial conditions can be simulated. The structural analysis methods include nonlinear finite element and boundary element methods. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. The scope of the code has recently been expanded to include probabilistic life and fatigue prediction of structures in terms of component and system reliability and risk analysis of structures considering cost of failure. The code is currently being extended to structural reliability considering progressive crack propagation. Several examples are presented to demonstrate the new capabilities.

  3. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

    PubMed Central

    2014-01-01

    Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231

  4. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines.

    PubMed

    Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin

    2014-04-28

    It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  5. Slat Noise Predictions Using Higher-Order Finite-Difference Methods on Overset Grids

    NASA Technical Reports Server (NTRS)

    Housman, Jeffrey A.; Kiris, Cetin

    2016-01-01

    Computational aeroacoustic simulations using the structured overset grid approach and higher-order finite difference methods within the Launch Ascent and Vehicle Aerodynamics (LAVA) solver framework are presented for slat noise predictions. The simulations are part of a collaborative study comparing noise generation mechanisms between a conventional slat and a Krueger leading edge flap. Simulation results are compared with experimental data acquired during an aeroacoustic test in the NASA Langley Quiet Flow Facility. Details of the structured overset grid, numerical discretization, and turbulence model are provided.

  6. Loads and aeroelasticity division research and technology accomplishments for FY 1983 and plans for FY 1984

    NASA Technical Reports Server (NTRS)

    Gardner, J. E.; Dixon, S. C.

    1984-01-01

    Research was done in the following areas: development and validation of solution algorithms, modeling techniques, integrated finite elements for flow-thermal-structural analysis and design, optimization of aircraft and spacecraft for the best performance, reduction of loads and increase in the dynamic structural stability of flexible airframes by the use of active control, methods for predicting steady and unsteady aerodynamic loads and aeroelastic characteristics of flight vehicles with emphasis on the transonic range, and methods for predicting and reducing helicoper vibrations.

  7. Generalized self-consistent method for predicting the effective elastic properties of composites with random hybrid structures

    NASA Astrophysics Data System (ADS)

    Pan'kov, A. A.

    1997-05-01

    The feasibility of using a generalized self-consistent method for predicting the effective elastic properties of composites with random hybrid structures has been examined. Using this method, the problem is reduced to solution of simpler special averaged problems for composites with single inclusions and corresponding transition layers in the medium examined. The dimensions of the transition layers are defined by correlation radii of the composite random structure of the composite, while the heterogeneous elastic properties of the transition layers take account of the probabilities for variation of the size and configuration of the inclusions using averaged special indicator functions. Results are given for a numerical calculation of the averaged indicator functions and analysis of the effect of the micropores in the matrix-fiber interface region on the effective elastic properties of unidirectional fiberglass—epoxy using the generalized self-consistent method and compared with experimental data and reported solutions.

  8. Ab Initio Protein Structure Prediction Using Chunk-TASSER

    PubMed Central

    Zhou, Hongyi; Skolnick, Jeffrey

    2007-01-01

    We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins ≤250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP3, original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted α-helix content ≥50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score ≥0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP3, TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is ∼11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction. PMID:17496016

  9. Assessment of CAPRI predictions in rounds 3-5 shows progress in docking procedures.

    PubMed

    Méndez, Raúl; Leplae, Raphaël; Lensink, Marc F; Wodak, Shoshana J

    2005-08-01

    The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional (3D) structure is reassessed by evaluating blind predictions, performed during 2003-2004 as part of Rounds 3-5 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Ten newly determined structures of protein-protein complexes were used as targets for these rounds. They comprised 2 enzyme-inhibitor complexes, 2 antigen-antibody complexes, 2 complexes involved in cellular signaling, 2 homo-oligomers, and a complex between 2 components of the bacterial cellulosome. For most targets, the predictors were given the experimental structures of 1 unbound and 1 bound component, with the latter in a random orientation. For some, the structure of the free component was derived from that of a related protein, requiring the use of homology modeling. In some of the targets, significant differences in conformation were displayed between the bound and unbound components, representing a major challenge for the docking procedures. For 1 target, predictions could not go to completion. In total, 1866 predictions submitted by 30 groups were evaluated. Over one-third of these groups applied completely novel docking algorithms and scoring functions, with several of them specifically addressing the challenge of dealing with side-chain and backbone flexibility. The quality of the predicted interactions was evaluated by comparison to the experimental structures of the targets, made available for the evaluation, using the well-agreed-upon criteria used previously. Twenty-four groups, which for the first time included an automatic Web server, produced predictions ranking from acceptable to highly accurate for all targets, including those where the structures of the bound and unbound forms differed substantially. These results and a brief survey of the methods used by participants of CAPRI Rounds 3-5 suggest that genuine progress in the performance of docking methods is being achieved, with CAPRI acting as the catalyst.

  10. RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information.

    PubMed

    Wang, Lei; You, Zhu-Hong; Chen, Xing; Yan, Xin; Liu, Gang; Zhang, Wei

    2018-01-01

    Identification of interaction between drugs and target proteins plays an important role in discovering new drug candidates. However, through the experimental method to identify the drug-target interactions remain to be extremely time-consuming, expensive and challenging even nowadays. Therefore, it is urgent to develop new computational methods to predict potential drugtarget interactions (DTI). In this article, a novel computational model is developed for predicting potential drug-target interactions under the theory that each drug-target interaction pair can be represented by the structural properties from drugs and evolutionary information derived from proteins. Specifically, the protein sequences are encoded as Position-Specific Scoring Matrix (PSSM) descriptor which contains information of biological evolutionary and the drug molecules are encoded as fingerprint feature vector which represents the existence of certain functional groups or fragments. Four benchmark datasets involving enzymes, ion channels, GPCRs and nuclear receptors, are independently used for establishing predictive models with Rotation Forest (RF) model. The proposed method achieved the prediction accuracy of 91.3%, 89.1%, 84.1% and 71.1% for four datasets respectively. In order to make our method more persuasive, we compared our classifier with the state-of-theart Support Vector Machine (SVM) classifier. We also compared the proposed method with other excellent methods. Experimental results demonstrate that the proposed method is effective in the prediction of DTI, and can provide assistance for new drug research and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information.

    PubMed

    Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas

    2006-03-09

    The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

  12. THE PRACTICE OF STRUCTURE ACTIVITY RELATIONSHIPS (SAR) IN TOXICOLOGY

    EPA Science Inventory

    Both qualitative and quantitative modeling methods relating chemical structure to biological activity, called structure-activity relationship analyses or SAR, are applied to the prediction and characterization of chemical toxicity. This minireview will discuss some generic issue...

  13. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling

    PubMed Central

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID:24748752

  14. A Simplified and Reliable Damage Method for the Prediction of the Composites Pieces

    NASA Astrophysics Data System (ADS)

    Viale, R.; Coquillard, M.; Seytre, C.

    2012-07-01

    Structural engineers are often faced to test results on composite structures largely tougher than predicted. By attempting to reduce this frequent gap, a survey of some extensive synthesis works relative to the prediction methods and to the failure criteria was led. This inquiry dealts with the plane stress state only. All classical methods have strong and weak points wrt practice and reliability aspects. The main conclusion is that in the plane stress case, the best usaul industrial methods give predictions rather similar. But very generally they do not explain the often large discrepancies wrt the tests, mainly in the cases of strong stress gradients or of bi-axial laminate loadings. It seems that only the methods considering the complexity of the composites damages (so-called physical methods or Continuum Damage Mechanics “CDM”) bring a clear mending wrt the usual methods..The only drawback of these methods is their relative intricacy mainly in urged industrial conditions. A method with an approaching but simplified representation of the CDM phenomenology is presented. It was compared to tests and other methods: - it brings a fear improvement of the correlation with tests wrt the usual industrial methods, - it gives results very similar to the painstaking CDM methods and very close to the test results. Several examples are provided. In addition this method is really thrifty wrt the material characterization as well as for the modelisation and the computation efforts.

  15. Computational Methods for Failure Analysis and Life Prediction

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K. (Compiler); Harris, Charles E. (Compiler); Housner, Jerrold M. (Compiler); Hopkins, Dale A. (Compiler)

    1993-01-01

    This conference publication contains the presentations and discussions from the joint UVA/NASA Workshop on Computational Methods for Failure Analysis and Life Prediction held at NASA Langley Research Center 14-15 Oct. 1992. The presentations focused on damage failure and life predictions of polymer-matrix composite structures. They covered some of the research activities at NASA Langley, NASA Lewis, Southwest Research Institute, industry, and universities. Both airframes and propulsion systems were considered.

  16. Calculation of flight vibration levels of the AH-1G helicopter and correlation with existing flight vibration measurements

    NASA Technical Reports Server (NTRS)

    Sopher, R.; Twomey, W. J.

    1990-01-01

    NASA-Langley is sponsoring a rotorcraft structural dynamics program with the objective to establish in the U.S. a superior capability to utilize finite element analysis models for calculations to support industrial design of helicopter airframe structures. In the initial phase of the program, teams from the major U.S. manufacturers of helicopter airframes will apply extant finite element analysis methods to calculate loads and vibrations of helicopter airframes, and perform correlations between analysis and measurements. The aforementioned rotorcraft structural dynamics program was given the acronym DAMVIBS (Design Analysis Method for Vibrations). Sikorsky's RDYNE Rotorcraft Dynamics Analysis used for the correlation study, the specifics of the application of RDYNE to the AH-1G, and comparisons of the predictions of the method with flight data for loads and vibrations on the AH-1G are described. RDYNE was able to predict trends of variations of loads and vibrations with airspeed, but in some instances magnitudes differed from measured results by factors of two or three to one. Sensitivities were studied of predictions to rotor inflow modeling, effects of torsional modes, number of blade bending modes, fuselage structural damping, and hub modal content.

  17. Wind erosion in semiarid landscapes: Predictive models and remote sensing methods for the influence of vegetation

    NASA Technical Reports Server (NTRS)

    Musick, H. Brad

    1993-01-01

    The objectives of this research are: to develop and test predictive relations for the quantitative influence of vegetation canopy structure on wind erosion of semiarid rangeland soils, and to develop remote sensing methods for measuring the canopy structural parameters that determine sheltering against wind erosion. The influence of canopy structure on wind erosion will be investigated by means of wind-tunnel and field experiments using structural variables identified by the wind-tunnel and field experiments using model roughness elements to simulate plant canopies. The canopy structural variables identified by the wind-tunnel and field experiments as important in determining vegetative sheltering against wind erosion will then be measured at a number of naturally vegetated field sites and compared with estimates of these variables derived from analysis of remotely sensed data.

  18. Mixed time integration methods for transient thermal analysis of structures, appendix 5

    NASA Technical Reports Server (NTRS)

    Liu, W. K.

    1982-01-01

    Mixed time integration methods for transient thermal analysis of structures are studied. An efficient solution procedure for predicting the thermal behavior of aerospace vehicle structures was developed. A 2D finite element computer program incorporating these methodologies is being implemented. The performance of these mixed time finite element algorithms can then be evaluated employing the proposed example problem.

  19. The Shock and Vibration Bulletin. Part 2. Invited Papers, Structural Dynamics

    DTIC Science & Technology

    1974-08-01

    VIKING LANDER DYNAMICS 41 Mr. Joseph C. Pohlen, Martin Marietta Aerospace, Denver, Colorado Structural Dynamics PERFORMANCE OF STATISTICAL ENERGY ANALYSIS 47...aerospace structures. Analytical prediction of these environments is beyond the current scope of classical modal techniques. Statistical energy analysis methods...have been developed that circumvent the difficulties of high-frequency nodal analysis. These statistical energy analysis methods are evaluated

  20. Deciphering the Preference and Predicting the Viability of Circular Permutations in Proteins

    PubMed Central

    Liu, Yen-Yi; Wang, Li-Fen; Hwang, Jenn-Kang; Lyu, Ping-Chiang

    2012-01-01

    Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology. PMID:22359629

  1. Disentangling the co-structure of multilayer interaction networks: degree distribution and module composition in two-layer bipartite networks.

    PubMed

    Astegiano, Julia; Altermatt, Florian; Massol, François

    2017-11-13

    Species establish different interactions (e.g. antagonistic, mutualistic) with multiple species, forming multilayer ecological networks. Disentangling network co-structure in multilayer networks is crucial to predict how biodiversity loss may affect the persistence of multispecies assemblages. Existing methods to analyse multilayer networks often fail to consider network co-structure. We present a new method to evaluate the modular co-structure of multilayer networks through the assessment of species degree co-distribution and network module composition. We focus on modular structure because of its high prevalence among ecological networks. We apply our method to two Lepidoptera-plant networks, one describing caterpillar-plant herbivory interactions and one representing adult Lepidoptera nectaring on flowers, thereby possibly pollinating them. More than 50% of the species established either herbivory or visitation interactions, but not both. These species were over-represented among plants and lepidopterans, and were present in most modules in both networks. Similarity in module composition between networks was high but not different from random expectations. Our method clearly delineates the importance of interpreting multilayer module composition similarity in the light of the constraints imposed by network structure to predict the potential indirect effects of species loss through interconnected modular networks.

  2. Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations.

    PubMed

    Rogers, Emily; Murrugarra, David; Heitsch, Christine

    2017-07-25

    Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones. Copyright © 2017. Published by Elsevier Inc.

  3. Uncertainty quantification and validation of 3D lattice scaffolds for computer-aided biomedical applications.

    PubMed

    Gorguluarslan, Recep M; Choi, Seung-Kyum; Saldana, Christopher J

    2017-07-01

    A methodology is proposed for uncertainty quantification and validation to accurately predict the mechanical response of lattice structures used in the design of scaffolds. Effective structural properties of the scaffolds are characterized using a developed multi-level stochastic upscaling process that propagates the quantified uncertainties at strut level to the lattice structure level. To obtain realistic simulation models for the stochastic upscaling process and minimize the experimental cost, high-resolution finite element models of individual struts were reconstructed from the micro-CT scan images of lattice structures which are fabricated by selective laser melting. The upscaling method facilitates the process of determining homogenized strut properties to reduce the computational cost of the detailed simulation model for the scaffold. Bayesian Information Criterion is utilized to quantify the uncertainties with parametric distributions based on the statistical data obtained from the reconstructed strut models. A systematic validation approach that can minimize the experimental cost is also developed to assess the predictive capability of the stochastic upscaling method used at the strut level and lattice structure level. In comparison with physical compression test results, the proposed methodology of linking the uncertainty quantification with the multi-level stochastic upscaling method enabled an accurate prediction of the elastic behavior of the lattice structure with minimal experimental cost by accounting for the uncertainties induced by the additive manufacturing process. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. A systematic search method for the identification of tightly packed transmembrane parallel alpha-helices.

    PubMed

    Akula, Nagaraju; Pattabiraman, Nagarajan

    2005-06-01

    Membrane proteins play a major role in number of biological processes such as signaling pathways. The determination of the three-dimensional structure of these proteins is increasingly important for our understanding of their structure-function relationships. Due to the difficulty in isolating membrane proteins for X-ray diffraction studies, computational techniques are being developed to generate the 3D structures of TM domains. Here, we present a systematic search method for the identification of energetically favorable and tightly packed transmembrane parallel alpha-helices. The first step in our systematic search method is the generation of 3D models for pairs of parallel helix bundles with all possible orientations followed by an energy-based filter to eliminate structures with severe non-bonded contacts. Then, a RMS-based filter was used to cluster these structures into families. Furthermore, these dimers were energy minimized using molecular mechanics force field. Finally, we identified the tightly packed parallel alpha-helices by using an interface surface area. To validate our search method, we compared our predicted GlycophorinA dimer structures with the reported NMR structures. With our search method, we are able to reproduce NMR structures of GPA with 0.9A RMSD. In addition, by considering the reported mutational data on GxxxG motif interactions, twenty percent of our predicted dimers are within in the 2.0A RMSD. The dimers obtained from our method were used to generate parallel trimeric and tetramer TM structures of GPA and found that the structure of GPA might exist only in a dimer form as reported earlier.

  5. Identification of Extracellular Segments by Mass Spectrometry Improves Topology Prediction of Transmembrane Proteins.

    PubMed

    Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G; Medzihradszky, Katalin F; Szakács, Gergely; Tusnády, Gábor E

    2017-02-13

    Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.

  6. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  7. Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

    PubMed

    Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

    2018-04-25

    Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.

  8. Pseudoracemic amino acid complexes: blind predictions for flexible two-component crystals.

    PubMed

    Görbitz, Carl Henrik; Dalhus, Bjørn; Day, Graeme M

    2010-08-14

    Ab initio prediction of the crystal packing in complexes between two flexible molecules is a particularly challenging computational chemistry problem. In this work we present results of single crystal structure determinations as well as theoretical predictions for three 1 ratio 1 complexes between hydrophobic l- and d-amino acids (pseudoracemates), known from previous crystallographic work to form structures with one of two alternative hydrogen bonding arrangements. These are accurately reproduced in the theoretical predictions together with a series of patterns that have never been observed experimentally. In this bewildering forest of potential polymorphs, hydrogen bonding arrangements and molecular conformations, the theoretical predictions succeeded, for all three complexes, in finding the correct hydrogen bonding pattern. For two of the complexes, the calculations also reproduce the exact space group and side chain orientations in the best ranked predicted structure. This includes one complex for which the observed crystal packing clearly contradicted previous experience based on experimental data for a substantial number of related amino acid complexes. The results highlight the significant recent advances that have been made in computational methods for crystal structure prediction.

  9. Aromatic claw: A new fold with high aromatic content that evades structural prediction: Aromatic Claw

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sachleben, Joseph R.; Adhikari, Aashish N.; Gawlak, Grzegorz

    2016-11-10

    We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β-sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α-helical with a small two-stranded β-sheet withmore » a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.« less

  10. RNA Secondary Structure Prediction by Using Discrete Mathematics: An Interdisciplinary Research Experience for Undergraduate Students

    PubMed Central

    Ellington, Roni; Wachira, James

    2010-01-01

    The focus of this Research Experience for Undergraduates (REU) project was on RNA secondary structure prediction by using a lattice walk approach. The lattice walk approach is a combinatorial and computational biology method used to enumerate possible secondary structures and predict RNA secondary structure from RNA sequences. The method uses discrete mathematical techniques and identifies specified base pairs as parameters. The goal of the REU was to introduce upper-level undergraduate students to the principles and challenges of interdisciplinary research in molecular biology and discrete mathematics. At the beginning of the project, students from the biology and mathematics departments of a mid-sized university received instruction on the role of secondary structure in the function of eukaryotic RNAs and RNA viruses, RNA related to combinatorics, and the National Center for Biotechnology Information resources. The student research projects focused on RNA secondary structure prediction on a regulatory region of the yellow fever virus RNA genome and on an untranslated region of an mRNA of a gene associated with the neurological disorder epilepsy. At the end of the project, the REU students gave poster and oral presentations, and they submitted written final project reports to the program director. The outcome of the REU was that the students gained transferable knowledge and skills in bioinformatics and an awareness of the applications of discrete mathematics to biological research problems. PMID:20810968

  11. RNA secondary structure prediction by using discrete mathematics: an interdisciplinary research experience for undergraduate students.

    PubMed

    Ellington, Roni; Wachira, James; Nkwanta, Asamoah

    2010-01-01

    The focus of this Research Experience for Undergraduates (REU) project was on RNA secondary structure prediction by using a lattice walk approach. The lattice walk approach is a combinatorial and computational biology method used to enumerate possible secondary structures and predict RNA secondary structure from RNA sequences. The method uses discrete mathematical techniques and identifies specified base pairs as parameters. The goal of the REU was to introduce upper-level undergraduate students to the principles and challenges of interdisciplinary research in molecular biology and discrete mathematics. At the beginning of the project, students from the biology and mathematics departments of a mid-sized university received instruction on the role of secondary structure in the function of eukaryotic RNAs and RNA viruses, RNA related to combinatorics, and the National Center for Biotechnology Information resources. The student research projects focused on RNA secondary structure prediction on a regulatory region of the yellow fever virus RNA genome and on an untranslated region of an mRNA of a gene associated with the neurological disorder epilepsy. At the end of the project, the REU students gave poster and oral presentations, and they submitted written final project reports to the program director. The outcome of the REU was that the students gained transferable knowledge and skills in bioinformatics and an awareness of the applications of discrete mathematics to biological research problems.

  12. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.

    PubMed

    Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A

    2015-05-01

    Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. © 2015 WILEY PERIODICALS, INC.

  13. The pKa Cooperative: A Collaborative Effort to Advance Structure-Based Calculations of pKa values and Electrostatic Effects in Proteins

    PubMed Central

    Nielsen, Jens E.; Gunner, M. R.; Bertrand García-Moreno, E.

    2012-01-01

    The pKa Cooperative http://www.pkacoop.org was organized to advance development of accurate and useful computational methods for structure-based calculation of pKa values and electrostatic energy in proteins. The Cooperative brings together laboratories with expertise and interest in theoretical, computational and experimental studies of protein electrostatics. To improve structure-based energy calculations it is necessary to better understand the physical character and molecular determinants of electrostatic effects. The Cooperative thus intends to foment experimental research into fundamental aspects of proteins that depend on electrostatic interactions. It will maintain a depository for experimental data useful for critical assessment of methods for structure-based electrostatics calculations. To help guide the development of computational methods the Cooperative will organize blind prediction exercises. As a first step, computational laboratories were invited to reproduce an unpublished set of experimental pKa values of acidic and basic residues introduced in the interior of staphylococcal nuclease by site-directed mutagenesis. The pKa values of these groups are unique and challenging to simulate owing to the large magnitude of their shifts relative to normal pKa values in water. Many computational methods were tested in this 1st Blind Prediction Challenge and critical assessment exercise. A workshop was organized in the Telluride Science Research Center to assess objectively the performance of many computational methods tested on this one extensive dataset. This volume of PROTEINS: Structure, Function, and Bioinformatics introduces the pKa Cooperative, presents reports submitted by participants in the blind prediction challenge, and highlights some of the problems in structure-based calculations identified during this exercise. PMID:22002877

  14. Predicting turns in proteins with a unified model.

    PubMed

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.

  15. Predicting Turns in Proteins with a Unified Model

    PubMed Central

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872

  16. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

    PubMed Central

    Xu, Dong; Zhang, Yang

    2013-01-01

    Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418

  17. The prediction of crystal structure by merging knowledge methods with first principles quantum mechanics

    NASA Astrophysics Data System (ADS)

    Ceder, Gerbrand

    2007-03-01

    The prediction of structure is a key problem in computational materials science that forms the platform on which rational materials design can be performed. Finding structure by traditional optimization methods on quantum mechanical energy models is not possible due to the complexity and high dimensionality of the coordinate space. An unusual, but efficient solution to this problem can be obtained by merging ideas from heuristic and ab initio methods: In the same way that scientist build empirical rules by observation of experimental trends, we have developed machine learning approaches that extract knowledge from a large set of experimental information and a database of over 15,000 first principles computations, and used these to rapidly direct accurate quantum mechanical techniques to the lowest energy crystal structure of a material. Knowledge is captured in a Bayesian probability network that relates the probability to find a particular crystal structure at a given composition to structure and energy information at other compositions. We show that this approach is highly efficient in finding the ground states of binary metallic alloys and can be easily generalized to more complex systems.

  18. A Hierarchical Clustering Methodology for the Estimation of Toxicity

    EPA Science Inventory

    A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...

  19. A general structure-property relationship to predict the enthalpy of vaporisation at ambient temperatures.

    PubMed

    Oberg, T

    2007-01-01

    The vapour pressure is the most important property of an anthropogenic organic compound in determining its partitioning between the atmosphere and the other environmental media. The enthalpy of vaporisation quantifies the temperature dependence of the vapour pressure and its value around 298 K is needed for environmental modelling. The enthalpy of vaporisation can be determined by different experimental methods, but estimation methods are needed to extend the current database and several approaches are available from the literature. However, these methods have limitations, such as a need for other experimental results as input data, a limited applicability domain, a lack of domain definition, and a lack of predictive validation. Here we have attempted to develop a quantitative structure-property relationship (QSPR) that has general applicability and is thoroughly validated. Enthalpies of vaporisation at 298 K were collected from the literature for 1835 pure compounds. The three-dimensional (3D) structures were optimised and each compound was described by a set of computationally derived descriptors. The compounds were randomly assigned into a calibration set and a prediction set. Partial least squares regression (PLSR) was used to estimate a low-dimensional QSPR model with 12 latent variables. The predictive performance of this model, within the domain of application, was estimated at n=560, q2Ext=0.968 and s=0.028 (log transformed values). The QSPR model was subsequently applied to a database of 100,000+ structures, after a similar 3D optimisation and descriptor generation. Reliable predictions can be reported for compounds within the previously defined applicability domain.

  20. Computer-aided prediction of xenobiotic metabolism in the human body

    NASA Astrophysics Data System (ADS)

    Bezhentsev, V. M.; Tarasova, O. A.; Dmitriev, A. V.; Rudik, A. V.; Lagunin, A. A.; Filimonov, D. A.; Poroikov, V. V.

    2016-08-01

    The review describes the major databases containing information about the metabolism of xenobiotics, including data on drug metabolism, metabolic enzymes, schemes of biotransformation and the structures of some substrates and metabolites. Computational approaches used to predict the interaction of xenobiotics with metabolic enzymes, prediction of metabolic sites in the molecule, generation of structures of potential metabolites for subsequent evaluation of their properties are considered. The advantages and limitations of various computational methods for metabolism prediction and the prospects for their applications to improve the safety and efficacy of new drugs are discussed. Bibliography — 165 references.

  1. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Pile Driving

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Machine-oriented structural engineering firm TERA, Inc. is engaged in a project to evaluate the reliability of offshore pile driving prediction methods to eventually predict the best pile driving technique for each new offshore oil platform. Phase I Pile driving records of 48 offshore platforms including such information as blow counts, soil composition and pertinent construction details were digitized. In Phase II, pile driving records were statistically compared with current methods of prediction. Result was development of modular software, the CRIPS80 Software Design Analyzer System, that companies can use to evaluate other prediction procedures or other data bases.

  3. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  4. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  5. Computational methods for prediction of RNA interactions with metal ions and small organic ligands.

    PubMed

    Philips, Anna; Łach, Grzegorz; Bujnicki, Janusz M

    2015-01-01

    In the recent years, it has become clear that a wide range of regulatory functions in bacteria are performed by riboswitches--regions of mRNA that change their structure upon external stimuli. Riboswitches are therefore attractive targets for drug design, molecular engineering, and fundamental research on regulatory circuitry of living cells. Several mechanisms are known for riboswitches controlling gene expression, but most of them perform their roles by ligand binding. As with other macromolecules, knowledge of the 3D structure of riboswitches is crucial for the understanding of their function. The development of experimental methods allowed for investigation of RNA structure and its complexes with ligands (which are either riboswitches' substrates or inhibitors) and metal cations (which stabilize the structure and are also known to be riboswitches' inhibitors). The experimental probing of different states of riboswitches is however time consuming, costly, and difficult to resolve without theoretical support. The natural consequence is the use of computational methods at least for initial research, such as the prediction of putative binding sites of ligands or metal ions. Here, we present a review on such methods, with a special focus on knowledge-based methods developed in our laboratory: LigandRNA--a scoring function for the prediction of RNA-small molecule interactions and MetalionRNA--a predictor of metal ions-binding sites in RNA structures. Both programs are available free of charge as a Web servers, LigandRNA at http://ligandrna.genesilico.pl and MetalionRNA at http://metalionrna.genesilico.pl/. © 2015 Elsevier Inc. All rights reserved.

  6. Advanced composites structural concepts and materials technologies for primary aircraft structures: Structural response and failure analysis

    NASA Technical Reports Server (NTRS)

    Dorris, William J.; Hairr, John W.; Huang, Jui-Tien; Ingram, J. Edward; Shah, Bharat M.

    1992-01-01

    Non-linear analysis methods were adapted and incorporated in a finite element based DIAL code. These methods are necessary to evaluate the global response of a stiffened structure under combined in-plane and out-of-plane loading. These methods include the Arc Length method and target point analysis procedure. A new interface material model was implemented that can model elastic-plastic behavior of the bond adhesive. Direct application of this method is in skin/stiffener interface failure assessment. Addition of the AML (angle minus longitudinal or load) failure procedure and Hasin's failure criteria provides added capability in the failure predictions. Interactive Stiffened Panel Analysis modules were developed as interactive pre-and post-processors. Each module provides the means of performing self-initiated finite elements based analysis of primary structures such as a flat or curved stiffened panel; a corrugated flat sandwich panel; and a curved geodesic fuselage panel. This module brings finite element analysis into the design of composite structures without the requirement for the user to know much about the techniques and procedures needed to actually perform a finite element analysis from scratch. An interactive finite element code was developed to predict bolted joint strength considering material and geometrical non-linearity. The developed method conducts an ultimate strength failure analysis using a set of material degradation models.

  7. Geometrically Nonlinear Static Analysis of 3D Trusses Using the Arc-Length Method

    NASA Technical Reports Server (NTRS)

    Hrinda, Glenn A.

    2006-01-01

    Rigorous analysis of geometrically nonlinear structures demands creating mathematical models that accurately include loading and support conditions and, more importantly, model the stiffness and response of the structure. Nonlinear geometric structures often contain critical points with snap-through behavior during the response to large loads. Studying the post buckling behavior during a portion of a structure's unstable load history may be necessary. Primary structures made from ductile materials will stretch enough prior to failure for loads to redistribute producing sudden and often catastrophic collapses that are difficult to predict. The responses and redistribution of the internal loads during collapses and possible sharp snap-back of structures have frequently caused numerical difficulties in analysis procedures. The presence of critical stability points and unstable equilibrium paths are major difficulties that numerical solutions must pass to fully capture the nonlinear response. Some hurdles still exist in finding nonlinear responses of structures under large geometric changes. Predicting snap-through and snap-back of certain structures has been difficult and time consuming. Also difficult is finding how much load a structure may still carry safely. Highly geometrically nonlinear responses of structures exhibiting complex snap-back behavior are presented and analyzed with a finite element approach. The arc-length method will be reviewed and shown to predict the proper response and follow the nonlinear equilibrium path through limit points.

  8. Environmental Capability of Liquid Lubricants

    NASA Technical Reports Server (NTRS)

    Beerbower, A.

    1973-01-01

    The methods available for predicting the properties of liquid lubricants from their structural formulas are discussed. The methods make it possible to design lubricants by forecasting the results of changing the structure and to determine the limits to which liquid lubricants can cope with environmental extremes. The methods are arranged in order of their thermodynamic properties through empirical physical properties to chemical properties.

  9. Integrating linear optimization with structural modeling to increase HIV neutralization breadth.

    PubMed

    Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy

    2018-02-01

    Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.

  10. LCS-TA to identify similar fragments in RNA 3D structures.

    PubMed

    Wiedemann, Jakub; Zok, Tomasz; Milostan, Maciej; Szachniuk, Marta

    2017-10-23

    In modern structural bioinformatics, comparison of molecular structures aimed to identify and assess similarities and differences between them is one of the most commonly performed procedures. It gives the basis for evaluation of in silico predicted models. It constitutes the preliminary step in searching for structural motifs. In particular, it supports tracing the molecular evolution. Faced with an ever-increasing amount of available structural data, researchers need a range of methods enabling comparative analysis of the structures from either global or local perspective. Herein, we present a new, superposition-independent method which processes pairs of RNA 3D structures to identify their local similarities. The similarity is considered in the context of structure bending and bonds' rotation which are described by torsion angles. In the analyzed RNA structures, the method finds the longest continuous segments that show similar torsion within a user-defined threshold. The length of the segment is provided as local similarity measure. The method has been implemented as LCS-TA algorithm (Longest Continuous Segments in Torsion Angle space) and is incorporated into our MCQ4Structures application, freely available for download from http://www.cs.put.poznan.pl/tzok/mcq/ . The presented approach ties torsion-angle-based method of structure analysis with the idea of local similarity identification by handling continuous 3D structure segments. The first method, implemented in MCQ4Structures, has been successfully utilized in RNA-Puzzles initiative. The second one, originally applied in Euclidean space, is a component of LGA (Local-Global Alignment) algorithm commonly used in assessing protein models submitted to CASP. This unique combination of concepts implemented in LCS-TA provides a new perspective on structure quality assessment in local and quantitative aspect. A series of computational experiments show the first results of applying our method to comparison of RNA 3D models. LCS-TA can be used for identifying strengths and weaknesses in the prediction of RNA tertiary structures.

  11. Modeling ready biodegradability of fragrance materials.

    PubMed

    Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

    2015-06-01

    In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.

  12. Assessing deep and shallow learning methods for quantitative prediction of acute chemical toxicity.

    PubMed

    Liu, Ruifeng; Madore, Michael; Glover, Kyle P; Feasel, Michael G; Wallqvist, Anders

    2018-05-02

    Animal-based methods for assessing chemical toxicity are struggling to meet testing demands. In silico approaches, including machine-learning methods, are promising alternatives. Recently, deep neural networks (DNNs) were evaluated and reported to outperform other machine-learning methods for quantitative structure-activity relationship modeling of molecular properties. However, most of the reported performance evaluations relied on global performance metrics, such as the root mean squared error (RMSE) between the predicted and experimental values of all samples, without considering the impact of sample distribution across the activity spectrum. Here, we carried out an in-depth analysis of DNN performance for quantitative prediction of acute chemical toxicity using several datasets. We found that the overall performance of DNN models on datasets of up to 30,000 compounds was similar to that of random forest (RF) models, as measured by the RMSE and correlation coefficients between the predicted and experimental results. However, our detailed analyses demonstrated that global performance metrics are inappropriate for datasets with a highly uneven sample distribution, because they show a strong bias for the most populous compounds along the toxicity spectrum. For highly toxic compounds, DNN and RF models trained on all samples performed much worse than the global performance metrics indicated. Surprisingly, our variable nearest neighbor method, which utilizes only structurally similar compounds to make predictions, performed reasonably well, suggesting that information of close near neighbors in the training sets is a key determinant of acute toxicity predictions.

  13. Similarity indices based on link weight assignment for link prediction of unweighted complex networks

    NASA Astrophysics Data System (ADS)

    Liu, Shuxin; Ji, Xinsheng; Liu, Caixia; Bai, Yi

    2017-01-01

    Many link prediction methods have been proposed for predicting the likelihood that a link exists between two nodes in complex networks. Among these methods, similarity indices are receiving close attention. Most similarity-based methods assume that the contribution of links with different topological structures is the same in the similarity calculations. This paper proposes a local weighted method, which weights the strength of connection between each pair of nodes. Based on the local weighted method, six local weighted similarity indices extended from unweighted similarity indices (including Common Neighbor (CN), Adamic-Adar (AA), Resource Allocation (RA), Salton, Jaccard and Local Path (LP) index) are proposed. Empirical study has shown that the local weighted method can significantly improve the prediction accuracy of these unweighted similarity indices and that in sparse and weakly clustered networks, the indices perform even better.

  14. A New Hybrid-Multiscale SSA Prediction of Non-Stationary Time Series

    NASA Astrophysics Data System (ADS)

    Ghanbarzadeh, Mitra; Aminghafari, Mina

    2016-02-01

    Singular spectral analysis (SSA) is a non-parametric method used in the prediction of non-stationary time series. It has two parameters, which are difficult to determine and very sensitive to their values. Since, SSA is a deterministic-based method, it does not give good results when the time series is contaminated with a high noise level and correlated noise. Therefore, we introduce a novel method to handle these problems. It is based on the prediction of non-decimated wavelet (NDW) signals by SSA and then, prediction of residuals by wavelet regression. The advantages of our method are the automatic determination of parameters and taking account of the stochastic structure of time series. As shown through the simulated and real data, we obtain better results than SSA, a non-parametric wavelet regression method and Holt-Winters method.

  15. SHM-Based Probabilistic Fatigue Life Prediction for Bridges Based on FE Model Updating

    PubMed Central

    Lee, Young-Joo; Cho, Soojin

    2016-01-01

    Fatigue life prediction for a bridge should be based on the current condition of the bridge, and various sources of uncertainty, such as material properties, anticipated vehicle loads and environmental conditions, make the prediction very challenging. This paper presents a new approach for probabilistic fatigue life prediction for bridges using finite element (FE) model updating based on structural health monitoring (SHM) data. Recently, various types of SHM systems have been used to monitor and evaluate the long-term structural performance of bridges. For example, SHM data can be used to estimate the degradation of an in-service bridge, which makes it possible to update the initial FE model. The proposed method consists of three steps: (1) identifying the modal properties of a bridge, such as mode shapes and natural frequencies, based on the ambient vibration under passing vehicles; (2) updating the structural parameters of an initial FE model using the identified modal properties; and (3) predicting the probabilistic fatigue life using the updated FE model. The proposed method is demonstrated by application to a numerical model of a bridge, and the impact of FE model updating on the bridge fatigue life is discussed. PMID:26950125

  16. Computational predictions of the new Gallium nitride nanoporous structures

    NASA Astrophysics Data System (ADS)

    Lien, Le Thi Hong; Tuoc, Vu Ngoc; Duong, Do Thi; Thu Huyen, Nguyen

    2018-05-01

    Nanoporous structural prediction is emerging area of research because of their advantages for a wide range of materials science and technology applications in opto-electronics, environment, sensors, shape-selective and bio-catalysis, to name just a few. We propose a computationally and technically feasible approach for predicting Gallium nitride nanoporous structures with hollows at the nano scale. The designed porous structures are studied with computations using the density functional tight binding (DFTB) and conventional density functional theory methods, revealing a variety of promising mechanical and electronic properties, which can potentially find future realistic applications. Their stability is discussed by means of the free energy computed within the lattice-dynamics approach. Our calculations also indicate that all the reported hollow structures are wide band gap semiconductors in the same fashion with their parent’s bulk stable phase. The electronic band structures of these nanoporous structures are finally examined in detail.

  17. Tailor-made force fields for crystal-structure prediction.

    PubMed

    Neumann, Marcus A

    2008-08-14

    A general procedure is presented to derive a complete set of force-field parameters for flexible molecules in the crystalline state on a case-by-case basis. The force-field parameters are fitted to the electrostatic potential as well as to accurate energies and forces generated by means of a hybrid method that combines solid-state density functional theory (DFT) calculations with an empirical van der Waals correction. All DFT calculations are carried out with the VASP program. The mathematical structure of the force field, the generation of reference data, the choice of the figure of merit, the optimization algorithm, and the parameter-refinement strategy are discussed in detail. The approach is applied to cyclohexane-1,4-dione, a small flexible ring. The tailor-made force field obtained for cyclohexane-1,4-dione is used to search for low-energy crystal packings in all 230 space groups with one molecule per asymmetric unit, and the most stable crystal structures are reoptimized in a second step with the hybrid method. The experimental crystal structure is found as the most stable predicted crystal structure both with the tailor-made force field and the hybrid method. The same methodology has also been applied successfully to the four compounds of the fourth CCDC blind test on crystal-structure prediction. For the five aforementioned compounds, the root-mean-square deviations between lattice energies calculated with the tailor-made force fields and the hybrid method range from 0.024 to 0.053 kcal/mol per atom around an average value of 0.034 kcal/mol per atom.

  18. Strong ground motion prediction using virtual earthquakes.

    PubMed

    Denolle, M A; Dunham, E M; Prieto, G A; Beroza, G C

    2014-01-24

    Sedimentary basins increase the damaging effects of earthquakes by trapping and amplifying seismic waves. Simulations of seismic wave propagation in sedimentary basins capture this effect; however, there exists no method to validate these results for earthquakes that have not yet occurred. We present a new approach for ground motion prediction that uses the ambient seismic field. We apply our method to a suite of magnitude 7 scenario earthquakes on the southern San Andreas fault and compare our ground motion predictions with simulations. Both methods find strong amplification and coupling of source and structure effects, but they predict substantially different shaking patterns across the Los Angeles Basin. The virtual earthquake approach provides a new approach for predicting long-period strong ground motion.

  19. A protein-dependent side-chain rotamer library.

    PubMed

    Bhuyan, Md Shariful Islam; Gao, Xin

    2011-12-14

    Protein side-chain packing problem has remained one of the key open problems in bioinformatics. The three main components of protein side-chain prediction methods are a rotamer library, an energy function and a search algorithm. Rotamer libraries summarize the existing knowledge of the experimentally determined structures quantitatively. Depending on how much contextual information is encoded, there are backbone-independent rotamer libraries and backbone-dependent rotamer libraries. Backbone-independent libraries only encode sequential information, whereas backbone-dependent libraries encode both sequential and locally structural information. However, side-chain conformations are determined by spatially local information, rather than sequentially local information. Since in the side-chain prediction problem, the backbone structure is given, spatially local information should ideally be encoded into the rotamer libraries. In this paper, we propose a new type of backbone-dependent rotamer library, which encodes structural information of all the spatially neighboring residues. We call it protein-dependent rotamer libraries. Given any rotamer library and a protein backbone structure, we first model the protein structure as a Markov random field. Then the marginal distributions are estimated by the inference algorithms, without doing global optimization or search. The rotamers from the given library are then re-ranked and associated with the updated probabilities. Experimental results demonstrate that the proposed protein-dependent libraries significantly outperform the widely used backbone-dependent libraries in terms of the side-chain prediction accuracy and the rotamer ranking ability. Furthermore, without global optimization/search, the side-chain prediction power of the protein-dependent library is still comparable to the global-search-based side-chain prediction methods.

  20. Molecules for materials: germanium hydride neutrals and anions. Molecular structures, electron affinities, and thermochemistry of GeHn/GeHn- (n = 0-4) and Ge2Hn/Ge2Hn(-) (n = 0-6).

    PubMed

    Li, Qian-Shu; Lü, Rui-Hua; Xie, Yaoming; Schaefer, Henry F

    2002-12-01

    The GeH(n) (n = 0-4) and Ge(2)H(n) (n = 0-6) systems have been studied systematically by five different density functional methods. The basis sets employed are of double-zeta plus polarization quality with additional s- and p-type diffuse functions, labeled DZP++. For each compound plausible energetically low-lying structures were optimized. The methods used have been calibrated against a comprehensive tabulation of experimental electron affinities (Chemical Reviews 102, 231, 2002). The geometries predicted in this work include yet unknown anionic species, such as Ge(2)H(-), Ge(2)H(2)(-), Ge(2)H(3)(-), Ge(2)H(4)(-), and Ge(2)H(5)(-). In general, the BHLYP method predicts the geometries closest to the few available experimental structures. A number of structures rather different from the analogous well-characterized hydrocarbon radicals and anions are predicted. For example, a vinylidene-like GeGeH(2) (-) structure is the global minimum of Ge(2)H(2) (-). For neutral Ge(2)H(4), a methylcarbene-like HGë-GeH(3) is neally degenerate with the trans-bent H(2)Ge=GeH(2) structure. For the Ge(2)H(4) (-) anion, the methylcarbene-like system is the global minimum. The three different neutral-anion energy differences reported in this research are: the adiabatic electron affinity (EA(ad)), the vertical electron affinity (EA(vert)), and the vertical detachment energy (VDE). For this family of molecules the B3LYP method appears to predict the most reliable electron affinities. The adiabatic electron affinities after the ZPVE correction are predicted to be 2.02 (Ge(2)), 2.05 (Ge(2)H), 1.25 (Ge(2)H(2)), 2.09 (Ge(2)H(3)), 1.71 (Ge(2)H(4)), 2.17 (Ge(2)H(5)), and -0.02 (Ge(2)H(6)) eV. We also reported the dissociation energies for the GeH(n) (n = 1-4) and Ge(2)H(n) (n = 1-6) systems, as well as those for their anionic counterparts. Our theoretical predictions provide strong motivation for the further experimental study of these important germanium hydrides. Copyright 2002 Wiley Periodicals, Inc.

  1. Structural predictions for Correlated Electron Materials Using the Functional Dynamical Mean Field Theory Approach

    NASA Astrophysics Data System (ADS)

    Haule, Kristjan

    2018-04-01

    The Dynamical Mean Field Theory (DMFT) in combination with the band structure methods has been able to address reach physics of correlated materials, such as the fluctuating local moments, spin and orbital fluctuations, atomic multiplet physics and band formation on equal footing. Recently it is getting increasingly recognized that more predictive ab-initio theory of correlated systems needs to also address the feedback effect of the correlated electronic structure on the ionic positions, as the metal-insulator transition is almost always accompanied with considerable structural distortions. We will review recently developed extension of merger between the Density Functional Theory (DFT) and DMFT method, dubbed DFT+ embedded DMFT (DFT+eDMFT), whichsuccessfully addresses this challenge. It is based on the stationary Luttinger-Ward functional to minimize the numerical error, it subtracts the exact double-counting of DFT and DMFT, and implements self-consistent forces on all atoms in the unit cell. In a few examples, we will also show how the method elucidated the important feedback effect of correlations on crystal structure in rare earth nickelates to explain the mechanism of the metal-insulator transition. The method showed that such feedback effect is also essential to understand the dynamic stability of the high-temperature body-centered cubic phase of elemental iron, and in particular it predicted strong enhancement of the electron-phonon coupling over DFT values in FeSe, which was very recently verified by pioneering time-domain experiment.

  2. Periodic Forced Response of Structures Having Three-Dimensional Frictional Constraints

    NASA Astrophysics Data System (ADS)

    CHEN, J. J.; YANG, B. D.; MENQ, C. H.

    2000-01-01

    Many mechanical systems have moving components that are mutually constrained through frictional contacts. When subjected to cyclic excitations, a contact interface may undergo constant changes among sticks, slips and separations, which leads to very complex contact kinematics. In this paper, a 3-D friction contact model is employed to predict the periodic forced response of structures having 3-D frictional constraints. Analytical criteria based on this friction contact model are used to determine the transitions among sticks, slips and separations of the friction contact, and subsequently the constrained force which consists of the induced stick-slip friction force on the contact plane and the contact normal load. The resulting constrained force is often a periodic function and can be considered as a feedback force that influences the response of the constrained structures. By using the Multi-Harmonic Balance Method along with Fast Fourier Transform, the constrained force can be integrated with the receptance of the structures so as to calculate the forced response of the constrained structures. It results in a set of non-linear algebraic equations that can be solved iteratively to yield the relative motion as well as the constrained force at the friction contact. This method is used to predict the periodic response of a frictionally constrained 3-d.o.f. oscillator. The predicted results are compared with those of the direct time integration method so as to validate the proposed method. In addition, the effect of super-harmonic components on the resonant response and jump phenomenon is examined.

  3. AGARD Manual on Aeroelasticity in Axial-Flow Turbomachines. Volume 2. Structural Dynamics and Aeroelasticity,

    DTIC Science & Technology

    1988-06-01

    LEVELSKSI C. Q ac ca VANE OVERALL TOTAL-STATIC EXPANSION RATOS * Figure 12. Prediction of Response due to Second Stage Vane. 22-12 SAP /- MAXIMUM...assessment methods, written by Armstrong. The problem of life time prediction is reviewed by Labourdette, who also summarizes ONERA’s research in...applicable to single blades and bladed assemblies. The blade fatigue problem and its assessment methods, and life-time- prediction are considered. Aeroelastic

  4. Comparison of prediction methods for octanol-air partition coefficients of diverse organic compounds.

    PubMed

    Fu, Zhiqiang; Chen, Jingwen; Li, Xuehua; Wang, Ya'nan; Yu, Haiying

    2016-04-01

    The octanol-air partition coefficient (KOA) is needed for assessing multimedia transport and bioaccumulability of organic chemicals in the environment. As experimental determination of KOA for various chemicals is costly and laborious, development of KOA estimation methods is necessary. We investigated three methods for KOA prediction, conventional quantitative structure-activity relationship (QSAR) models based on molecular structural descriptors, group contribution models based on atom-centered fragments, and a novel model that predicts KOA via solvation free energy from air to octanol phase (ΔGO(0)), with a collection of 939 experimental KOA values for 379 compounds at different temperatures (263.15-323.15 K) as validation or training sets. The developed models were evaluated with the OECD guidelines on QSAR models validation and applicability domain (AD) description. Results showed that although the ΔGO(0) model is theoretically sound and has a broad AD, the prediction accuracy of the model is the poorest. The QSAR models perform better than the group contribution models, and have similar predictability and accuracy with the conventional method that estimates KOA from the octanol-water partition coefficient and Henry's law constant. One QSAR model, which can predict KOA at different temperatures, was recommended for application as to assess the long-range transport potential of chemicals. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. A fourth order Euler/Navier-Stokes prediction method for the aerodynamics and aeroelasticity of hovering rotor blades

    NASA Astrophysics Data System (ADS)

    Smith, Marilyn Jones

    Some of the computational issues relating to the development of a three-dimensional fourth-order compact Euler/Navier-Stokes methodology for rotary wing flows and its coupling with an elastic rotor blade beam structural model have been explored. The compact Euler/NavierStokes method is used to predict the aerodynamic loads on an isolated rotor blade. Because the scheme is fourth-order, fewer grid nodes are necessary to predict loads with the same accuracy as traditional second order methodologies on finer grids. Grid and numerical parameter optimizations were performed to examine the changes in the predictive capabilities of the higher-order scheme. Comparisons were made with experimental data for a rotor using NACA 0012 airfoil sections and a rectangular planform with no twist. Simulations for both lifting and non-lifting configurations at various tip Mach numbers were performed. This Euler/Navier-Stokes methodology can be applied to rotor blades with either rigid-blade or elastic-beam-structural models to determine the steady-state response in hovering flight. The blade is represented by a geometrically nonlinear beam model which accounts for coupled flap bending, lead-lag bending and torsion. Moderately large displacements and rotations due to structural deformations can be simulated. The analysis has been performed for blade configurations having uniform mass and stiffness, no twist, and no chordwise offsets of the elastic and tension axes, as well as the center of mass. The results are compared with a panel method coupled with the same structural dynamics model. Computations have been made to predict the aerodynamic deflections for the rotor in hover. A starting solution using initial deflections predicted by aeroelastic analyses with a two-dimensional aerodynamic model was investigated. The present Euler/Navier-Stokes method using a momentum wake and a contracting vortex wake shows the impact on the aeroelastic deflections of a three-dimensional aerodynamic module which includes rotational and viscous effects, particularly at higher collective pitch angles. The differences in the aeroelastic predictions using fully coupled and loosely coupled aerodynamic analyses are examined. The induced wake plays a critical role in determining the final equilibrium tip deflections.

  6. QSAR study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using LS-SVM and GRNN based on principal components.

    PubMed

    Shahlaei, Mohsen; Sabet, Razieh; Ziari, Maryam Bahman; Moeinifard, Behzad; Fassihi, Afshin; Karbakhsh, Reza

    2010-10-01

    Quantitative relationships between molecular structure and methionine aminopeptidase-2 inhibitory activity of a series of cytotoxic anthranilic acid sulfonamide derivatives were discovered. We have demonstrated the detailed application of two efficient nonlinear methods for evaluation of quantitative structure-activity relationships of the studied compounds. Components produced by principal component analysis as input of developed nonlinear models were used. The performance of the developed models namely PC-GRNN and PC-LS-SVM were tested by several validation methods. The resulted PC-LS-SVM model had a high statistical quality (R(2)=0.91 and R(CV)(2)=0.81) for predicting the cytotoxic activity of the compounds. Comparison between predictability of PC-GRNN and PC-LS-SVM indicates that later method has higher ability to predict the activity of the studied molecules. Copyright (c) 2010 Elsevier Masson SAS. All rights reserved.

  7. Computational prediction of muon stopping sites using ab initio random structure searching (AIRSS)

    NASA Astrophysics Data System (ADS)

    Liborio, Leandro; Sturniolo, Simone; Jochym, Dominik

    2018-04-01

    The stopping site of the muon in a muon-spin relaxation experiment is in general unknown. There are some techniques that can be used to guess the muon stopping site, but they often rely on approximations and are not generally applicable to all cases. In this work, we propose a purely theoretical method to predict muon stopping sites in crystalline materials from first principles. The method is based on a combination of ab initio calculations, random structure searching, and machine learning, and it has successfully predicted the MuT and MuBC stopping sites of muonium in Si, diamond, and Ge, as well as the muonium stopping site in LiF, without any recourse to experimental results. The method makes use of Soprano, a Python library developed to aid ab initio computational crystallography, that was publicly released and contains all the software tools necessary to reproduce our analysis.

  8. Predicting drug side-effect profiles: a chemical fragment-based approach

    PubMed Central

    2011-01-01

    Background Drug side-effects, or adverse drug reactions, have become a major public health concern. It is one of the main causes of failure in the process of drug development, and of drug withdrawal once they have reached the market. Therefore, in silico prediction of potential side-effects early in the drug discovery process, before reaching the clinical stages, is of great interest to improve this long and expensive process and to provide new efficient and safe therapies for patients. Results In the present work, we propose a new method to predict potential side-effects of drug candidate molecules based on their chemical structures, applicable on large molecular databanks. A unique feature of the proposed method is its ability to extract correlated sets of chemical substructures (or chemical fragments) and side-effects. This is made possible using sparse canonical correlation analysis (SCCA). In the results, we show the usefulness of the proposed method by predicting 1385 side-effects in the SIDER database from the chemical structures of 888 approved drugs. These predictions are performed with simultaneous extraction of correlated ensembles formed by a set of chemical substructures shared by drugs that are likely to have a set of side-effects. We also conduct a comprehensive side-effect prediction for many uncharacterized drug molecules stored in DrugBank, and were able to confirm interesting predictions using independent source of information. Conclusions The proposed method is expected to be useful in various stages of the drug development process. PMID:21586169

  9. Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods

    PubMed Central

    Burbrink, Frank T.; McKelvy, Alexander D.; Pyron, R. Alexander; Myers, Edward A.

    2015-01-01

    Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities. PMID:26609083

  10. Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods.

    PubMed

    Burbrink, Frank T; McKelvy, Alexander D; Pyron, R Alexander; Myers, Edward A

    2015-11-22

    Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities. © 2015 The Author(s).

  11. A Method for Predicting Protein Complexes from Dynamic Weighted Protein-Protein Interaction Networks.

    PubMed

    Liu, Lizhen; Sun, Xiaowu; Song, Wei; Du, Chao

    2018-06-01

    Predicting protein complexes from protein-protein interaction (PPI) network is of great significance to recognize the structure and function of cells. A protein may interact with different proteins under different time or conditions. Existing approaches only utilize static PPI network data that may lose much temporal biological information. First, this article proposed a novel method that combines gene expression data at different time points with traditional static PPI network to construct different dynamic subnetworks. Second, to further filter out the data noise, the semantic similarity based on gene ontology is regarded as the network weight together with the principal component analysis, which is introduced to deal with the weight computing by three traditional methods. Third, after building a dynamic PPI network, a predicting protein complexes algorithm based on "core-attachment" structural feature is applied to detect complexes from each dynamic subnetworks. Finally, it is revealed from the experimental results that our method proposed in this article performs well on detecting protein complexes from dynamic weighted PPI networks.

  12. Metabolite identification through multiple kernel learning on fragmentation trees.

    PubMed

    Shen, Huibin; Dührkop, Kai; Böcker, Sebastian; Rousu, Juho

    2014-06-15

    Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list. © The Author 2014. Published by Oxford University Press.

  13. Mutations that Cause Human Disease: A Computational/Experimental Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beernink, P; Barsky, D; Pesavento, B

    International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less

  14. Impact of active controls technology on structural integrity

    NASA Technical Reports Server (NTRS)

    Noll, Thomas; Austin, Edward; Donley, Shawn; Graham, George; Harris, Terry

    1991-01-01

    This paper summarizes the findings of The Technical Cooperation Program to assess the impact of active controls technology on the structural integrity of aeronautical vehicles and to evaluate the present state-of-the-art for predicting the loads caused by a flight-control system modification and the resulting change in the fatigue life of the flight vehicle. The potential for active controls to adversely affect structural integrity is described, and load predictions obtained using two state-of-the-art analytical methods are given.

  15. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2018-05-25

    Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.

  16. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

    PubMed

    Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

    2017-07-03

    Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.

    PubMed

    Kippert, Fred; Gerloff, Dietlind L

    2009-09-24

    HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains.

  18. Highly Sensitive Detection of Individual HEAT and ARM Repeats with HHpred and COACH

    PubMed Central

    Kippert, Fred; Gerloff, Dietlind L.

    2009-01-01

    Background HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Methodology and Principal Findings Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. Significance A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains. PMID:19777061

  19. Structured Kernel Subspace Learning for Autonomous Robot Navigation.

    PubMed

    Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai

    2018-02-14

    This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.

  20. Predictive mapping of forest composition and structure with direct gradient analysis and nearest neighbor imputation in coastal Oregon, U.S.A.

    Treesearch

    Janet L. Ohmann; Matthew J. Gregory

    2002-01-01

    Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground attributes of vegetation to...

Top